Expressive Visual Text-to-Speech Using Active Appearance Models

Robert Anderson; Bjorn Stenger; Vincent Wan; Roberto Cipolla

2013 CVPR CVPR 2013

Expressive Visual Text-to-Speech Using Active Appearance Models

Abstract

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

🚀 Conference Pioneer — CVPR 2013

🌱 Topic Pioneer — Phonetics

🌉 Interdisciplinary Bridge — Computer Vision and Interdisciplinary and Speech & Audio

📈 Trend Setter — Phonetics

🧭 Keyword Pioneer — expressive speech synthesis

🐣 Hot Topic Early Bird — facial animation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

Authors

Robert Anderson , Bjorn Stenger , Vincent Wan , Roberto Cipolla

Topics

Interdisciplinary > Linguistics > Phonetics Computer Vision > Generation > Image Editing Speech & Audio > Synthesis > Speech Synthesis

Keywords

facial animation expressive speech synthesis active appearance model visual text-to-speech talking head visual speech synthesis

Download PDF

Related papers

Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques 2013

An Approach to Pose-Based Action Recognition 2013

Modeling Actions through State Changes 2013

A Convex Regularizer for Reducing Color Artifact in Color Image Recovery 2013

Deformable Spatial Pyramid Matching for Fast Dense Correspondences 2013