Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition

Souvik Kundu; Khe Chai Sim; Mark J.F. Gales

2016 INTERSPEECH INTERSPEECH 2016

Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition

Abstract

It is difficult to apply well-formulated model-based noise adaptation approaches to Deep Neural Network (DNN) due to the lack of interpretability of the model parameters. In this paper, we propose incorporating a generative front-end layer (GFL), which is parameterised by Gaussian Mixture Model (GMM), into the DNN. A GFL can be easily adapted to different noise conditions by applying the model-based Vector Taylor Series (VTS) to the underlying GMM. We show that incorporating a GFL to DNN yields 12.1% relative improvement over a baseline multi-condition DNN. We also show that the proposed system performs significantly better than the noise aware training method, where the per-utterance estimated noise parameters are appended to the acoustic features.

🚀 Conference Pioneer — INTERSPEECH 2016

🧭 Keyword Pioneer — noise robust speech recognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio