2007 NIPS NeurIPS 2007

A probabilistic model for generating realistic lip movements from speech

Abstract

The present work aims to model the correspondence between facial motion and speech. The face and sound are modelled separately, with phonemes being the link between both. We propose a sequential model and evaluate its suitability for the generation of the facial animation from a sequence of phonemes, which we obtain from speech. We evaluate the results both by computing the error between generated sequences and real video, as well as with a rigorous double-blind test with human subjects. Experiments show that our model compares favourably to other existing methods and that the sequences generated are comparable to real video sequences.

🌱 Topic Pioneer — Speech Enhancement
🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Interdisciplinary and Speech & Audio
📈 Trend Setter — Multimodal Learning
🧭 Keyword Pioneer — facial animation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio
🐣 Hot Topic Early Bird — speech synthesis