2024 INTERSPEECH INTERSPEECH 2024

Graph Attention Based Multi-Channel U-Net for Speech Dereverberation With Ad-Hoc Microphone Arrays

Abstract

Speech dereverberation with ad-hoc microphone arrays seems not studied sufficiently, particularly in the scenario where the reverberation time is large. In this paper, we propose a novel multi-channel U-Net model for speech dereverberation with ad-hoc microphone arrays, where an attention module is integrated into the model in an end-to-end training manner to do channel selection and fusion. Specifically, we first train a single-channel U-Net model. Then, we replicate the U-Net model to each channel. Finally, we train the attention module for aggregating the information of the channels, where the parameters of the U-Net model are fixed at this stage. To our knowledge, this is the first work that U-Net was used for dereverberation with ad-hoc microphone arrays. We studied two attention mechanism, which are the self-attention and graph-attention; moreover, we integrated the attention module into either the bottleneck layer or the output layer of the multi-channel U-Net, which results in four implementations. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance, and the attention module is very important in channel selection and fusion for improving the performance against long reverberation time.

🧭 Keyword Pioneer — ad-hoc microphone
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio