Optimizing DNN Adaptation for Recognition of Enhanced Speech

Marco Matassoni; Alessio Brutti; Daniele Falavigna

2017 INTERSPEECH INTERSPEECH 2017

Optimizing DNN Adaptation for Recognition of Enhanced Speech

Abstract

Speech enhancement directly using deep neural network (DNN) is of major interest due to the capability of DNN to tangibly reduce the impact of noisy conditions in speech recognition tasks. Similarly, DNN based acoustic model adaptation to new environmental conditions is another challenging topic. In this paper we present an analysis of acoustic model adaptation in presence of a disjoint speech enhancement component, identifying an optimal setting for improving the speech recognition performance. Adaptation is derived from a consolidated technique that introduces in the training process a regularization term to prevent overfitting. We propose to optimize the adaptation of the clean acoustic models towards the enhanced speech by tuning the regularization term based on the degree of enhancement. Experiments on a popular noisy dataset (e.g., AURORA-4) demonstrate the validity of the proposed approach.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — environmental condition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Marco Matassoni , Alessio Brutti , Daniele Falavigna

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Recognition > Speech Recognition Speech & Audio > Analysis > Speech Enhancement Machine Learning > Learning Types > Domain Adaptation

Keywords

speech enhancement noise robustness deep neural network regularization term acoustic model adaptation environmental condition model regularization

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017