2021 INTERSPEECH INTERSPEECH 2021

Feature Fusion by Attention Networks for Robust DOA Estimation

Abstract

Direction of arrival (DOA) estimation is a key front-end technology for many speech-based intelligent systems. Deep neural networks-based DOA systems have recently demonstrated better performances than conventional ones. However, most of the existing networks use only one specific acoustical feature as input, limiting their noise-robustness. This paper proposes an attention-based feature fusion approach for DOA estimation. Two classical DOA estimation approaches, i.e., the least mean square-based adaptive filtering and the generalized cross-correlation, are adopted, and the respective features are served as input to the networks. Network with attention mechanism is built to learn the optimal weighting scheme, which can take advantage of the two features’ complementary contributions in DOA estimation. Simulation and real test results show that the proposed method could use the complementary DOA information in different features and improve estimation accuracy under acoustic conditions with both noise and reverberation.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio