2024 CVPR CVPR 2024

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Abstract

We present Intrinsic Image Diffusion a generative model for appearance decomposition of indoor scenes. Given a single input view we sample multiple possible material explanations represented as albedo roughness and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue we advocate for a probabilistic formulation where instead of attempting to directly predict the true material properties we employ a conditional generative model to sample from the solution space. Furthermore we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper more consistent and more detailed materials outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning
🧭 Keyword Pioneer — appearance decomposition
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio