2020 INTERSPEECH INTERSPEECH 2020

Generative Adversarial Network Based Acoustic Echo Cancellation

Abstract

Generative adversarial networks (GANs) have become a popular research topic in speech enhancement like noise suppression. By training the noise suppression algorithm in an adversarial scenario, GAN based solutions often yield good performance. In this paper, a convolutional recurrent GAN architecture (CRGAN-EC) is proposed to address both linear and nonlinear echo scenarios. The proposed architecture is trained in frequency domain and predicts the time-frequency (TF) mask for the target speech. Several metric loss functions are deployed and their influence on echo cancellation performance is studied. Experimental results suggest that the proposed method outperforms the existing methods for unseen speakers in terms of echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ). Moreover, multiple metric loss functions provide more freedom to achieve specific goals, e.g., more echo suppression or less distortion.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio
🧭 Keyword Pioneer — metric loss function
🐣 Hot Topic Early Bird — frequency domain
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio