HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation

Antoine Mercier; Ramin Nakhli; Mahesh Reddy; Rajeev Yasarla; Hong Cai; Fatih Porikli; Guillaume Berger

2025 WACV WACV 2025

HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation

Abstract

Despite the latest remarkable advances in generative modeling efficient generation of high-quality 3D objects from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of samples while their 2D counterparts contain billions of text-image pairs. To address this we propose a novel approach which harnesses the power of large pretrained 2D diffusion models. More specifically our approach HexaGen3D fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding 3D latent. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization and can infer high-quality and diverse objects from textual prompts in 7 seconds offering significantly better quality-to-latency trade-offs than existing approaches. Furthermore HexaGen3D demonstrates strong generalization to new objects or compositions.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Antoine Mercier , Ramin Nakhli , Mahesh Reddy , Rajeev Yasarla , Hong Cai , Fatih Porikli , Guillaume Berger

Topics

Deep Learning > Architectures > Transformers Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Generation > 3D Generation

Keywords

diffusion model text-to-image model text-to-3d generation mesh generation textured mesh orthographic projection

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025