2019 INTERSPEECH INTERSPEECH 2019

Monaural Speech Enhancement with Dilated Convolutions

Abstract

In this study, we propose a novel dilated convolutional neural network for enhancing speech in noisy and reverberant environments. The proposed model incorporates dilated convolutions for tracking a target speaker through context aggregations, skip connections, and residual learning for mapping-based monaural speech enhancement. The performance of our model was evaluated in a variety of simulated environments having different reverberation times and quantified using two objective measures. Experimental results show that the proposed model outperforms a long short-term memory (LSTM), a gated residual network (GRN) and convolutional recurrent network (CRN) model in terms of objective speech intelligibility and speech quality in noisy and reverberant environments. Compared to LSTM, CRN and GRN, our method has improved generalization to untrained speakers and noise, and has fewer training parameters resulting in greater computational efficiency.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio