RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

Xiang Deng; Zerong Zheng; Yuxiang Zhang; Jingxiang Sun; Chao Xu; Xiaodong Yang; Lizhen Wang; Yebin Liu

2024 CVPR CVPR 2024

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

Abstract

This paper focuses on advancing the applicability of human avatar learning methods by proposing RAM-Avatar which learns a Real-time photo-realistic Avatar that supports full-body control from Monocular videos. To achieve this goal RAM-Avatar leverages two statistical templates responsible for modeling the facial expression and hand gesture variations while a sparsely computed dual attention module is introduced upon another body template to facilitate high-fidelity texture rendering for the torsos and limbs. Building on this foundation we deploy a lightweight yet powerful StyleUnet along with a temporal-aware discriminator to achieve real-time realistic rendering. To enable robust animation for out-of-distribution poses we propose a Motion Distribution Align module to compensate for the discrepancies between the training and testing motion distribution. Results and extensive experiments conducted in various experimental settings demonstrate the superiority of our proposed method and a real-time live system is proposed to further push research into applications. The training and testing code will be released for research purposes.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — full-body control

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiang Deng , Zerong Zheng , Yuxiang Zhang , Jingxiang Sun , Chao Xu , Xiaodong Yang , Lizhen Wang , Yebin Liu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Deep Learning > Models > Generative Models Computer Vision > Analysis > 3D Vision Computer Vision > Generation > Image Generation Computer Science > Applications > Computer Graphics Computer Vision > Generation > 3D Generation

Keywords

attention mechanism neural rendering generative adversarial network real-time rendering monocular video motion compensation neural avatar avatar modeling full-body control texture rendering

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024