Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Shengze Wang; Xueting Li; Chao Liu; Matthew Chan; Michael Stengel; Henry Fuchs; Shalini De Mello; Koki Nagano

2025 CVPR CVPR 2025

Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Abstract

Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built from a single reference image but fail to faithfully preserve the user's per-frame appearance (e.g., instantaneous facial expressions and lighting). As a result, neither of these two frameworks is an ideal solution for democratized 3D telepresence. In this work, we address this dilemma and propose a novel solution that maintains both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearance. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction and temporal consistency on in-studio and in-the-wild datasets.

🧭 Keyword Pioneer — triplane fusion

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shengze Wang , Xueting Li , Chao Liu , Matthew Chan , Michael Stengel , Henry Fuchs , Shalini De Mello , Koki Nagano

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Video Generation Computer Vision > Generation > 3D Generation Computer Vision > Processing > 3D Vision

Keywords

3d reconstruction neural radiance field temporal consistency 3d avatar triplane representation portrait video triplane fusion

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025