INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising
Abstract
The Interspeech 2020 Deep Noise Suppression (DNS) Challenge focuses on evaluating low-latency single-channel speech enhancement algorithms under realistic test conditions. Our contribution to the challenge is a method for joint dereverberation and denoising based on complex spectral mask estimation using a fully convolutional recurrent network (FCRN) which relies on a convolutional LSTM layer for temporal modeling. Since the effects of reverberation and noise on perceived speech quality can differ notably, a multi-target loss for controlling the weight on desired dereverberation and denoising is proposed. In the crowdsourced subjective P.808 listening test conducted by the DNS Challenge organizers, the proposed method shows a significant overall improvement of 0.43 MOS points over the DNS Challenge baseline and ranks amongst the top-3 submissions for both realtime and non-realtime tracks of the challenge.