Magic3D: High-Resolution Text-to-3D Content Creation

Chen-Hsuan Lin; Jun Gao; Luming Tang; Towaki Takikawa; xiaohui zeng; Xun Huang; Karsten Kreis; Sanja Fidler; Ming-Yu Liu; Tsung-Yi Lin

2023 CVPR CVPR 2023

Magic3D: High-Resolution Text-to-3D Content Creation

Abstract

Recently, DreamFusion demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: 1) optimization of the NeRF representation is extremely slow, 2) NeRF is supervised by images at a low resolution (64x64), thus leading to low-quality 3D models with a long wait time. In this paper, we address these limitations by utilizing a two-stage coarse-to-fine optimization framework. In the first stage, we use a sparse 3D neural representation to accelerate optimization while using a low-resolution diffusion prior. In the second stage, we use a textured mesh model initialized from the coarse neural representation, allowing us to perform optimization with a very efficient differentiable renderer interacting with high-resolution images. Our method, dubbed Magic3D, can create a 3D mesh model in 40 minutes, 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while achieving 8x higher resolution. User studies show 61.7% raters to prefer our approach than DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Computer Vision and Deep Learning

🧭 Keyword Pioneer — text-to-3d synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chen-Hsuan Lin , Jun Gao , Luming Tang , Towaki Takikawa , xiaohui zeng , Xun Huang , Karsten Kreis , Sanja Fidler , Ming-Yu Liu , Tsung-Yi Lin

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Science > Systems > Computer Graphics Artificial Intelligence > Core AI > Computer Vision Computer Vision > Generation > 3D Generation

Keywords

3d reconstruction generative model diffusion model 3d generation neural radiance field mesh generation 3d mesh generation text-to-3d synthesis coarse-to-fine optimization

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023