I recently join Playground as a Research Lead. We are working on pixel foundation models, and we are hiring! Send me an email if you are interested.
I was a Senior Research Scientist at NVIDIA Toronto AI Lab in Toronto, where I work on computer vision, computer graphics, generative models and machine learning.
I'm interested in computer vision, computer graphics, generative models and machine learning. Much of my research is about exploiting generative models for various computer vision tasks, such as semantic segmentation, image editing, and representation learning.
We propose a new text-to-image model architecture that deep-fusion large language models(Llama3) to improve text-to-image alignment. Our model achieves state-of-the-art performance in terms of text generation and text-image consistency, better than Flux and Ideogram.
We share three insights to enhance aesthetic quality in text-to-image generation. Our new model achieves better aesthetic quality than Midjourney 5.2 and beats SDXL with a large margin in all multi-aspect ratios conditions.
We propose a new pre-training framework by distilling knowledge from generative models onto commonly-use image backbones, and show generative models,
as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
We use lift-splat-shoot like representation to encode driving scene and nerf like representation to decode scenes with view controls. We then learn a hierarchical LDM on the latent representation for driving scene generations.
We develop a 3D generative model to generate meshes with textures, bridging the success in the differentiable surface modeling, differentiable rendering and 2D GANs.
We use GAN to model multi-domain objects with shared attributes, and use morph net to model geometry differences. We show its application in segmentation transfer and image editting tasks.
We use generative models to model joint distribution of images and semantic labels, and use it for semi-supervised learning and out-of-domain generalization.
We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations.