2024 INTERSPEECH INTERSPEECH 2024

DGSRN: Noise-Robust Speech Recognition Method with Dual-Path Gated Spectral Refinement Network

Abstract

The advancements in speech recognition have led to significant progress in predicting clean speech. However, challenges persist in real-world noisy environments. Addressing issues such as speech distortion and noise residue in signals processed by speech enhancement models, we propose a noise-robust speech recognition method based on the Dual-Path Gated Spectral Refinement Network (DGSRN). We construct a single-channel speech enhancement model based on dense time-frequency convolutional networks for the first stage of noise suppression. And the Dual-Path Gated Spectral Refinement Network is designed to extract useful features from estimated noise to enhance speech quality. Multi-task joint training is conducted using a weighted speech distortion loss function. Experimental results demonstrate that compared to traditional joint training approaches, DGSRN achieves a 12.41% reduction in Character Error Rate, addressing the issue of mismatched performance on evaluation metrics.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio
🧭 Keyword Pioneer — noise-robust speech recognition
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio