2019 INTERSPEECH INTERSPEECH 2019

One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features

Abstract

This paper introduces a method of noise-robust automatic speech recognition (ASR) that remains effective under one-pass single-channel processing. Under these constraints, the use of single-channel speech enhancement seems to be a reasonable noise-robust approach to ASR, because complicated techniques requiring multi-pass processing cannot be used. However, in many cases, single-channel speech enhancement seriously deteriorates the accuracy of ASR because of speech distortion. In addition, the advanced acoustic modeling framework (joint training) is relatively ineffective in the case of single-channel processing. To overcome these problems, we propose a noise-robust acoustic modeling framework based on a feature-level combination of noisy speech and enhanced speech. To obtain further improvements, we also adopt a sub-network-level combination of noisy and enhanced speech, and a gating mechanism that can dynamically select appropriate speech features. Through comparative evaluations, we confirm that the proposed method successfully improves the accuracy of ASR in noisy environments under strong constraints.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio