2023 INTERSPEECH INTERSPEECH 2023

SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking

Abstract

Sound source localization and tracking have been extensively studied. Recently, there has been considerable interest in highly reverberant scenarios and steered response power with phase transform (SRP-PHAT) based models have shown a good performance. However, these models still have limitations because the SRP-PHAT algorithm cannot represent the direction of the source in such adverse environments. In this paper, we propose a novel structure combining a super-resolution model and a single sound source localization model that allows to improve direction estimation performance. The proposed method generates a robust power map that accurately represents the direction of the source, even in poor scenarios. Furthermore, the proposed structure has a lower computational cost because it uses a low-resolution map. Experimental results on simulation-based and real-world data show that the proposed method outperforms the state-of-the-art model, Cross3D.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🧭 Keyword Pioneer — source tracking
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Machine Learning, Natural Language Processing, Speech & Audio