Daiqing Li

I recently join Roblox as a Principle AI Scientist, and we are working on 3D foundation models. Shot me an email if you want to work with us!

I was a Research Lead at Playground. We worked on pixel-foundation model and we open-source and developed the SoTA image generative models, playground-v2,v2.5 and v3. Check out the publication for more details.

I was a Senior Research Scientist at NVIDIA Toronto AI Lab in Toronto, where I work on computer vision, computer graphics, generative models and machine learning.

At NVIDIA, I work closely with Sanja Fidler and Antonio Torralba. Several of our works have been integrated into NVIDIA products like Omniverse and Clara. I graduated from University of Toronto and I recieved MICCAI Young Scientist Awards runner-up.

Email / Google Scholar / Twitter / Github

News

Mar 2025: Our position paper on 3D foundation models: Cube: A Roblox View of 3D Intelligence is on Arxiv .

Dec 2024: Interview by Niklas Forsström to talk about DreamTeacher focusing on Generative Representation Learning.

Oct 2024: Arxiv technical report Playground v3. A model achieves SoTA text generation and text-image consistency.

Feb 2024: Open souce Playground v2.5. A model achieves better aesthetic quality than Midjourney 5.2.

Dec 2023: Gave a talk at University of Bern.

Dec 2023: Open souce Playground v2. A 2.5x better model over SDXL in user preference.

Aug 2023: I join Playground as a research lead.

July 2023: DreamTeacher is accepted to ICCV 2023.

Research

I'm interested in computer vision, computer graphics, generative models and machine learning. Much of my research is about exploiting generative models for various computer vision tasks, such as semantic segmentation, image editing, and representation learning.

	Cube: A Roblox View of 3D Intelligence 3D Foundation Team., Arxiv, 2025 blog / video / arXiv / github / huggingface We open-source our text-to-3D foundation model - Cube3D and arxiv a position paper of our long-term vision on 3D Intelligence in Roblox.
	Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li Arxiv, 2024 blog / video / arXiv We propose a new text-to-image model architecture that deep-fusion large language models(Llama3) to improve text-to-image alignment. Our model achieves state-of-the-art performance in terms of text generation and text-image consistency, better than Flux and Ideogram.
	Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation Daiqing Li, Aleks Kamko, Ehsan Akhgari, Ali Sabet, Linmiao Xu, Suhail Doshi, Arxiv, 2024 blog / huggingface / video / arXiv We share three insights to enhance aesthetic quality in text-to-image generation. Our new model achieves better aesthetic quality than Midjourney 5.2 and beats SDXL with a large margin in all multi-aspect ratios conditions.
	DreamTeacher: Pretraining Image Backbones with Deep Generative Models Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler, ICCV, 2023 project page / video / arXiv We propose a new pre-training framework by distilling knowledge from generative models onto commonly-use image backbones, and show generative models, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
	NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler, CVPR, 2023 project page / video / arXiv We use lift-splat-shoot like representation to encode driving scene and nerf like representation to decode scenes with view controls. We then learn a hierarchical LDM on the latent representation for driving scene generations.
	GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler, NeurIPS, 2022 (Spotlight Presentation) project page / video / arXiv We develop a 3D generative model to generate meshes with textures, bridging the success in the differentiable surface modeling, differentiable rendering and 2D GANs.
	How Much More Data Do I Need? Estimating Requirements for Downstream Tasks Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law, CVPR, 2022 project page / video / arXiv We use a family of functions that generalize the power-law to allow for better estimation of data requirements under limited budgets.
	Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps Seung Wook Kim, Karsten Kreis, Daiqing Li, Antonio Torralba, Sanja Fidler, CVPR, 2022 (Oral Presentation) project page / video / arXiv We use GAN to model multi-domain objects with shared attributes, and use morph net to model geometry differences. We show its application in segmentation transfer and image editting tasks.
	BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba CVPR, 2022 project page / video / arXiv We extend DatasetGAN to large-scale dataset ImageNet with as few as 5 annotations per ImageNet category.
	EditGAN: High-Precision Semantic Image Editing Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler NeurIPS, 2021 project page / video / arXiv We use GAN to model joint distribution of images and semantic labels, and use it for semantic aware image editing.
	Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler CVPR, 2021 project page / video / arXiv We use generative models to model joint distribution of images and semantic labels, and use it for semi-supervised learning and out-of-domain generalization.
	Federated Simulation for Medical Imaging Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F Frangi, Sanja Fidler MICCAI, 2020 (Young Scientist Awards (YSA) Runner-up) project page / video / arXiv We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations.
	Neural Turtle Graphics for Modeling City Road Layouts Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler ICCV, 2019 (Oral Presentation) project page / video / arXiv We propose Neural Turtle Graphics (NTG) to model spatial graphs, demostrate application in city road layouts generation.
	A Face-to-Face Neural Conversation Model Hang Chu, Daiqing Li, Sanja Fidler CVPR, 2018 project page / video / arXiv We use an RNN encoder-decoder that exploits the movement of facial muscles, as well as the verbal conversation.

Professional Service

Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR

Template from source code.