Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions

Samik Sadhu; Hynek Hermansky

2019 INTERSPEECH INTERSPEECH 2019

Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions

Abstract

In this work, we demonstrate the robustness of Modulation Vectors, in domain mismatches between the training and test conditions in an Automatic Speech Recognition (ASR) system. Our work focuses on the specific task of dealing with mismatches caused by reverberation. We use simulated data from TIMIT and real reverberant speech from the REVERB challenge data to evaluate the performance of our system. The paper also describes a multistream system to combine information from Mel Frequency Cepstral Coefficient (MFCC) and M-vectors to improve the ASR performance in both matched and mismatched datasets. The proposed multistream system achieves a relative improvement of 25% in recognition accuracy on the mismatched condition, while a M-vector trained hybrid ASR system shows a 7–8% improvement in recognition accuracy, both w.r.t. a MFCC trained hybrid ASR system.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — modulation vector

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Samik Sadhu , Hynek Hermansky

Topics

Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

automatic speech recognition feature representation domain mismatch reverberant speech modulation vector

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019