FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models

Stathis Galanakis; Alexandros Lattas; Stylianos Moschoglou; Stefanos Zafeiriou

2025 WACV WACV 2025

FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models

Abstract

The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work we present FitDiff a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles our model accurately generates relightable facial avatars utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. The introduced multi-modal diffusion model concurrently outputs facial reflectance maps (diffuse and specular albedo and normals) and shapes showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first LDM conditioned on face recognition embeddings FitDiff reconstructs relightable human avatars that can be used as-is in common rendering engines starting only from an unconstrained facial image and achieving state-of-the-art performance.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Stathis Galanakis , Alexandros Lattas , Stylianos Moschoglou , Stefanos Zafeiriou

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation Healthcare & Medicine > Clinical > Medical Imaging

Keywords

image generation face recognition reflectance estimation diffusion model 3d face reconstruction avatar generation facial geometry facial reconstruction

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025