One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

Ting-Chun Wang; Arun Mallya; Ming-Yu Liu

2021 CVPR CVPR 2021

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

Abstract

We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth. Besides, we show our keypoint representation allows the user to rotate the head during synthesis, which is useful for simulating face-to-face video conferencing experiences.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — face rendering

🐣 Hot Topic Early Bird — video synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ting-Chun Wang , Arun Mallya , Ming-Yu Liu

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Computer Vision > Generation > Image Generation Computer Vision > Generation > Video Generation Deep Learning > Learning Types > Self-Supervised Learning Artificial Intelligence > Core AI > Computer Vision

Keywords

video synthesis neural rendering keypoint detection video conferencing face generation image animation keypoint representation face rendering talking-head synthesis neural talking-head video synthesis image-driven synthesis

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021