2019 INTERSPEECH INTERSPEECH 2019

Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions

Abstract

In this work, we demonstrate the robustness of Modulation Vectors, in domain mismatches between the training and test conditions in an Automatic Speech Recognition (ASR) system. Our work focuses on the specific task of dealing with mismatches caused by reverberation. We use simulated data from TIMIT and real reverberant speech from the REVERB challenge data to evaluate the performance of our system. The paper also describes a multistream system to combine information from Mel Frequency Cepstral Coefficient (MFCC) and M-vectors to improve the ASR performance in both matched and mismatched datasets. The proposed multistream system achieves a relative improvement of 25% in recognition accuracy on the mismatched condition, while a M-vector trained hybrid ASR system shows a 7–8% improvement in recognition accuracy, both w.r.t. a MFCC trained hybrid ASR system.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — modulation vector
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio