Acoustic Modeling for Google Home

Bo Li; Tara N. Sainath; Arun Narayanan; Joe Caroselli; Michiel Bacchiani; Ananya Misra; Izhak Shafran; Haşim Sak; Golan Pundak; Kean Chin; Khe Chai Sim; Ron J. Weiss; Kevin W. Wilson; Ehsan Variani; Chanwoo Kim; Olivier Siohan; Mitchel Weintraub; Erik McDermott; Richard Rose; Matt Shannon

2017 INTERSPEECH INTERSPEECH 2017

Acoustic Modeling for Google Home

Abstract

This paper describes the technical and system building advances made to the Google Home multichannel speech recognition system, which was launched in November 2016. Technical advances include an adaptive dereverberation frontend, the use of neural network models that do multichannel processing jointly with acoustic modeling, and Grid-LSTMs to model frequency variations. On the system level, improvements include adapting the model using Google Home specific data. We present results on a variety of multichannel sets. The combination of technical and system advances result in a reduction of WER of 8–28% relative compared to the current production system.

👥 Mega-Team — 20 authors

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🧭 Keyword Pioneer — multichannel processing

Authors

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition acoustic modeling neural network model multichannel speech recognition multichannel processing neural network

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017