Monaural Speech Enhancement with Dilated Convolutions

Shadi Pirhosseinloo; Jonathan S. Brumberg

2019 INTERSPEECH INTERSPEECH 2019

Monaural Speech Enhancement with Dilated Convolutions

Abstract

In this study, we propose a novel dilated convolutional neural network for enhancing speech in noisy and reverberant environments. The proposed model incorporates dilated convolutions for tracking a target speaker through context aggregations, skip connections, and residual learning for mapping-based monaural speech enhancement. The performance of our model was evaluated in a variety of simulated environments having different reverberation times and quantified using two objective measures. Experimental results show that the proposed model outperforms a long short-term memory (LSTM), a gated residual network (GRN) and convolutional recurrent network (CRN) model in terms of objective speech intelligibility and speech quality in noisy and reverberant environments. Compared to LSTM, CRN and GRN, our method has improved generalization to untrained speakers and noise, and has fewer training parameters resulting in greater computational efficiency.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

Authors

Shadi Pirhosseinloo , Jonathan S. Brumberg

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech enhancement speech intelligibility skip connection residual learning dilated convolution monaural speech

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019