2021 INTERSPEECH INTERSPEECH 2021

Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues

Abstract

Binaural speech separation algorithms designed for augmented hearing technologies need to both improve the signal-to-noise ratio of individual speakers and preserve their perceived location in space. The majority of binaural speech separation methods assume nonmoving speakers. As a result, their application to real-world scenarios with freely moving speakers requires block-wise adaptation which relies on short-term contextual information and limits their performance. In this study, we propose an alternative approach for utterance-level source separation with moving speakers and in reverberant conditions. Our model makes use of spectral and spatial features of speakers in a larger context compared to the block-wise adaption methods. The model can implicitly track speakers within the utterance without the need for explicit tracking modules. Experimental results on simulated moving multitalker speech show that the proposed method can significantly outperform block-wise adaptation methods in both separation performance and preserving the interaural cues across multiple conditions, which makes it suitable for real-world augmented hearing applications.

🧭 Keyword Pioneer — interaural cue
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio