Multi-Stage Progressive Speech Enhancement Network

Xinmeng Xu; Yang Wang; Dongxiang Xu; Yiyuan Peng; Cong Zhang; Jie Jia; Binbin Chen

2021 INTERSPEECH INTERSPEECH 2021

Multi-Stage Progressive Speech Enhancement Network

Abstract

Speech enhancement is a fundamental way to separate and generate clean speech from adverse environment where the received speech is seriously corrupted by noise. This paper applies a novel progressive network for speech enhancement by using multi-stage structure, where each stage contains a channel attention block followed by dilated encoder-decoder convolutional network with gated linear units. In addition, each stage generates a prediction that is refined by a supervised attention block. What is more, a fusion block is inserted between original inputs and outputs of previous stage. Multi-stage architecture is introduced to sequentially invoke multiple deep-learning networks, and its key ingredient is the information exchange between different stages. Thus, a more flexible and robust outputs can be generated. Experimental results show that the proposed architecture obtains consistently better performance than recent state-of-the-art models in terms of both PESQ and STOI scores.

🧭 Keyword Pioneer — dilated encoder-decoder

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinmeng Xu , Yang Wang , Dongxiang Xu , Yiyuan Peng , Cong Zhang , Jie Jia , Binbin Chen

Topics

Speech & Audio > Synthesis > Speech Enhancement

Keywords

channel attention deep learning speech enhancement supervised attention dilated encoder-decoder multi-stage architecture

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021