Daiqing Li

I recently join Playground as a Research Lead. We are working on pixel foundation models, and we are hiring! Send me an email if you are interested.

I was a Senior Research Scientist at NVIDIA Toronto AI Lab in Toronto, where I work on computer vision, computer graphics, generative models and machine learning.

At NVIDIA, I work closely with Sanja Fidler and Antonio Torralba. Several of our works have been integrated into NVIDIA products like Omniverse and Clara. I graduated from University of Toronto and I recieved MICCAI Young Scientist Awards runner-up.

Email  /  Google Scholar  /  Twitter  /  Github

profile photo
News

  • Oct 2024: Arxiv technical report Playground v3. A model achieves SoTA text generation and text-image consistency.
  • Feb 2024: Open souce Playground v2.5. A model achieves better aesthetic quality than Midjourney 5.2.
  • Dec 2023: Gave a talk at University of Bern.
  • Dec 2023: Open souce Playground v2. A 2.5x better model over SDXL in user preference.
  • Aug 2023: I join Playground as a research lead.
  • July 2023: DreamTeacher is accepted to ICCV 2023.
  • Research

    I'm interested in computer vision, computer graphics, generative models and machine learning. Much of my research is about exploiting generative models for various computer vision tasks, such as semantic segmentation, image editing, and representation learning.

    Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
    Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko,
    Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li
    Arxiv, 2024
    blog / video / arXiv

    We propose a new text-to-image model architecture that deep-fusion large language models(Llama3) to improve text-to-image alignment. Our model achieves state-of-the-art performance in terms of text generation and text-image consistency, better than Flux and Ideogram.

    Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
    Daiqing Li, Aleks Kamko, Ehsan Akhgari, Ali Sabet, Linmiao Xu,
    Suhail Doshi,
    Arxiv, 2024
    blog / huggingface / video / arXiv

    We share three insights to enhance aesthetic quality in text-to-image generation. Our new model achieves better aesthetic quality than Midjourney 5.2 and beats SDXL with a large margin in all multi-aspect ratios conditions.

    DreamTeacher: Pretraining Image Backbones with Deep Generative Models
    Daiqing Li*, Huan Ling*, Amlan Kar, David Acuna, Seung Wook Kim,
    Karsten Kreis, Antonio Torralba, Sanja Fidler,
    ICCV, 2023
    project page / video / arXiv

    We propose a new pre-training framework by distilling knowledge from generative models onto commonly-use image backbones, and show generative models, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.

    NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
    Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz,
    Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler,
    CVPR, 2023
    project page / video / arXiv

    We use lift-splat-shoot like representation to encode driving scene and nerf like representation to decode scenes with view controls. We then learn a hierarchical LDM on the latent representation for driving scene generations.

    GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images
    Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen,
    Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler,
    NeurIPS, 2022   (Spotlight Presentation)
    project page / video / arXiv

    We develop a 3D generative model to generate meshes with textures, bridging the success in the differentiable surface modeling, differentiable rendering and 2D GANs.

    How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
    Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion,
    Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law,
    CVPR, 2022
    project page / video / arXiv

    We use a family of functions that generalize the power-law to allow for better estimation of data requirements under limited budgets.

    Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps
    Seung Wook Kim, Karsten Kreis, Daiqing Li,
    Antonio Torralba, Sanja Fidler,
    CVPR, 2022   (Oral Presentation)
    project page / video / arXiv

    We use GAN to model multi-domain objects with shared attributes, and use morph net to model geometry differences. We show its application in segmentation transfer and image editting tasks.

    BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations
    Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis,
    Adela Barriuso, Sanja Fidler, Antonio Torralba
    CVPR, 2022
    project page / video / arXiv

    We extend DatasetGAN to large-scale dataset ImageNet with as few as 5 annotations per ImageNet category.

    EditGAN: High-Precision Semantic Image Editing
    Huan Ling, Karsten Kreis, Daiqing Li,
    Seung Wook Kim, Antonio Torralba, Sanja Fidler
    NeurIPS, 2021
    project page / video / arXiv

    We use GAN to model joint distribution of images and semantic labels, and use it for semantic aware image editing.

    Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
    Daiqing Li, Junlin Yang, Karsten Kreis,
    Antonio Torralba, Sanja Fidler
    CVPR, 2021
    project page / video / arXiv

    We use generative models to model joint distribution of images and semantic labels, and use it for semi-supervised learning and out-of-domain generalization.

    Federated Simulation for Medical Imaging
    Daiqing Li, Amlan Kar, Nishant Ravikumar,
    Alejandro F Frangi, Sanja Fidler
    MICCAI, 2020   (Young Scientist Awards (YSA) Runner-up)
    project page / video / arXiv

    We introduce a physics-driven generative approach that consists of two learnable neural modules: 1) a module that synthesizes 3D cardiac shapes along with their materials, and 2) a CT simulator that renders these into realistic 3D CT Volumes, with annotations.

    Neural Turtle Graphics for Modeling City Road Layouts
    Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina,
    Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler
    ICCV, 2019   (Oral Presentation)
    project page / video / arXiv

    We propose Neural Turtle Graphics (NTG) to model spatial graphs, demostrate application in city road layouts generation.

    A Face-to-Face Neural Conversation Model
    Hang Chu, Daiqing Li, Sanja Fidler
    CVPR, 2018
    project page / video / arXiv

    We use an RNN encoder-decoder that exploits the movement of facial muscles, as well as the verbal conversation.

    Professional Service
    Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR




    Template from source code.