A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement

Minh Tri Ho; Jinyoung Lee; Bong-Ki Lee; Dong Hoon Yi; Hong-Goo Kang

2020 INTERSPEECH INTERSPEECH 2020

A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement

Abstract

In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as well as spectral information, it is challenging to effectively train a multi-channel deep learning system in an end-to-end framework. With a channel-independent encoding architecture for spectral estimation and a strategy to extract spatial information through an inter-channel attention mechanism, we implement a multi-channel speech enhancement system that has high performance even in reverberant and extremely noisy environments. Experimental results show that the proposed architecture has superior performance in terms of signal-to-distortion ratio improvement (SDRi), short-time objective intelligence (STOI), and phoneme error rate (PER) for speech recognition.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — cross-channel attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Robotics, Speech & Audio

Authors

Minh Tri Ho , Jinyoung Lee , Bong-Ki Lee , Dong Hoon Yi , Hong-Goo Kang

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Synthesis > Speech Enhancement

Keywords

spatial information multi-channel speech enhancement end-to-end framework cross-channel attention

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020