Adaptive Multichannel Dereverberation for Automatic Speech Recognition

Joe Caroselli; Izhak Shafran; Arun Narayanan; Richard Rose

2017 INTERSPEECH INTERSPEECH 2017

Adaptive Multichannel Dereverberation for Automatic Speech Recognition

Abstract

Reverberation is known to degrade the performance of automatic speech recognition (ASR) systems dramatically in far-field conditions. Adopting the weighted prediction error (WPE) approach, we formulate an online dereverberation algorithm for a multi-microphone array. The key contributions of this paper are: (a) we demonstrate that dereverberation using WPE improves performance even when the acoustic models are trained using multi-style training (MTR) with noisy, reverberated speech; (b) we show that the gains from WPE are preserved even in large and diverse real-world data sets; (c) we propose an adaptive version for online multichannel ASR tasks which gives similar gains as the non-causal version; and (d) while the algorithm can just be applied for evaluation, we show that also including dereverberation during training gives increased performance gains. We also report how different parameter settings of the dereverberation algorithm impacts the ASR performance.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joe Caroselli , Izhak Shafran , Arun Narayanan , Richard Rose

Topics

Machine Learning > Optimization & Theory > Optimization Speech & Audio > Recognition > Speech Recognition Speech & Audio > Synthesis > Speech Enhancement

Keywords

automatic speech recognition online adaptation weighted prediction error

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017