Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues

Cong Han; Yi Luo; Nima Mesgarani

2021 INTERSPEECH INTERSPEECH 2021

Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues

Abstract

Binaural speech separation algorithms designed for augmented hearing technologies need to both improve the signal-to-noise ratio of individual speakers and preserve their perceived location in space. The majority of binaural speech separation methods assume nonmoving speakers. As a result, their application to real-world scenarios with freely moving speakers requires block-wise adaptation which relies on short-term contextual information and limits their performance. In this study, we propose an alternative approach for utterance-level source separation with moving speakers and in reverberant conditions. Our model makes use of spectral and spatial features of speakers in a larger context compared to the block-wise adaption methods. The model can implicitly track speakers within the utterance without the need for explicit tracking modules. Experimental results on simulated moving multitalker speech show that the proposed method can significantly outperform block-wise adaptation methods in both separation performance and preserving the interaural cues across multiple conditions, which makes it suitable for real-world augmented hearing applications.

🧭 Keyword Pioneer — interaural cue

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Cong Han , Yi Luo , Nima Mesgarani

Topics

Machine Learning > Learning Types > Unsupervised Learning

Keywords

source separation speech enhancement binaural hearing moving speaker binaural speech neural network spatial cue interaural cue

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021