PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu; Yuan-Chen Guo; Guan Luo; Heyi Sun; Wei Yin; Song-Hai Zhang

2024 CVPR CVPR 2024

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Abstract

Diffusion models trained on large-scale text-image datasets have demonstrated a strong capability of controllable high-quality image generation from arbitrary text prompts. However the generation quality and generalization ability of 3D diffusion models is hindered by the scarcity of high-quality and large-scale 3D datasets. In this paper we present PI3D a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes. The core idea is to connect the 2D and 3D domains by representing a 3D shape as a set of Pseudo RGB Images. We fine-tune an existing text-to-image diffusion model to produce such pseudo-images using a small number of text-3D pairs. Surprisingly we find that it can already generate meaningful and consistent 3D shapes given complex text descriptions. We further take the generated shapes as the starting point for a lightweight iterative refinement using score distillation sampling to achieve high-quality generation under a low budget. PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ying-Tian Liu , Yuan-Chen Guo , Guan Luo , Heyi Sun , Wei Yin , Song-Hai Zhang

Topics

Deep Learning > Models > Diffusion Models Deep Learning > Models > Generative Models Deep Learning > Techniques > Pretraining Computer Vision > Generation > Image Generation Computer Vision > Generation > Video Generation Artificial Intelligence > Core AI > Computer Vision Deep Learning > Learning Types > Fine-Tuning Computer Vision > Generation > 3D Generation

Keywords

3d shape generation generative model diffusion model 3d generation text-to-3d generation score distillation sampling text-to-image diffusion

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024