Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction

Ju Lin; Sufeng Niu; Zice Wei; Xiang Lan; Adriaan J. van Wijngaarden; Melissa C. Smith; Kuang-Ching Wang

2019 INTERSPEECH INTERSPEECH 2019

Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction

Abstract

Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — log-power spectrum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Ju Lin , Sufeng Niu , Zice Wei , Xiang Lan , Adriaan J. van Wijngaarden , Melissa C. Smith , Kuang-Ching Wang

Topics

Deep Learning > Models > Generative Models Speech & Audio > Synthesis > Speech Enhancement

Keywords

automatic speech recognition speech enhancement generative adversarial network speech quality spectral subtraction log-power spectrum

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019