Deep Lip Reading: A Comparison of Models and an Online Application

Triantafyllos Afouras; Joon Son Chung; Andrew Zisserman

2018 INTERSPEECH INTERSPEECH 2018

Deep Lip Reading: A Comparison of Models and an Online Application

Abstract

The goal of this paper is to develop state-of-the-art models for lip reading - visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully convolutional model; and (iii) the recently proposed transformer model. The recurrent and fully convolutional models are trained with a Connectionist Temporal Classification loss and use an explicit language model for decoding, the transformer is a sequence-to-sequence model. Our best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent. As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech and show that it achieves high performance with low latency.

🐣 Hot Topic Early Bird — transformer model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Triantafyllos Afouras , Joon Son Chung , Andrew Zisserman

Topics

Speech & Audio > Recognition > Speech Recognition

Keywords

connectionist temporal classification visual speech recognition sequence-to-sequence model word error rate lip reading transformer model

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018