Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Philipp Wulff; Felix Wimbauer; Dominik Muhle; Daniel Cremers

2025 ICCV ICCV 2025

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Abstract

Volumetric scene reconstruction from a single image is crucial for a broad range of applications like autonomous driving and robotics. Recent volumetric reconstruction methods achieve impressive results, but generally require expensive 3D ground truth or multi-view supervision. We propose to leverage pre-trained 2D diffusion models and depth prediction models to generate synthetic scene geometry from a single image. This can then be used to distill a feed-forward scene reconstruction model. Our experiments on the challenging KITTI-360 and Waymo datasets demonstrate that our method matches or outperforms state-of-the-art baselines that use multi-view supervision, and offers unique advantages, for example regarding dynamic scenes.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Philipp Wulff , Felix Wimbauer , Dominik Muhle , Daniel Cremers

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Depth Estimation Deep Learning > Techniques > Knowledge Distillation

Keywords

3d reconstruction knowledge distillation depth estimation monocular depth estimation scene reconstruction diffusion model single image monocular 3d reconstruction volumetric reconstruction monocular reconstruction depth distillation

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025