2025 WACV WACV 2025

FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models

Abstract

The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work we present FitDiff a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles our model accurately generates relightable facial avatars utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. The introduced multi-modal diffusion model concurrently outputs facial reflectance maps (diffuse and specular albedo and normals) and shapes showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first LDM conditioned on face recognition embeddings FitDiff reconstructs relightable human avatars that can be used as-is in common rendering engines starting only from an unconstrained facial image and achieving state-of-the-art performance.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio