SPACE: Speech-driven Portrait Animation with Controllable Expression

Siddharth Gururani; Arun Mallya; Ting-Chun Wang; Rafael Valle; Ming-Yu Liu

2023 ICCV ICCV 2023

SPACE: Speech-driven Portrait Animation with Controllable Expression

Abstract

Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. Please visit the project page to view the videos and to see more results: https://research.nvidia.com/labs/dir/space.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — lip sync

🐣 Hot Topic Early Bird — facial expression

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siddharth Gururani , Arun Mallya , Ting-Chun Wang , Rafael Valle , Ming-Yu Liu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > Face Recognition Computer Vision > Generation > Video Generation

Keywords

portrait animation facial expression head pose facial landmark lip sync

Download PDF

Related papers

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework 2023

Periodically Exchange Teacher-Student for Source-Free Object Detection 2023

Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations 2023

Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles 2023

3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation 2023