The Cone of Silence: Speech Separation by Localization

Teerapat Jenrungrot; Vivek Jayaram; Steve Seitz; Ira Kemelmacher-Shlizerman

2020 NIPS NeurIPS 2020

The Cone of Silence: Speech Separation by Localization

Abstract

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $\theta \pm w/2$, given an angle of interest $\theta$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm also allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state of the art performance for both source separation and source localization, particularly in high levels of background noise.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — angular window

🐣 Hot Topic Early Bird — source separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Teerapat Jenrungrot , Vivek Jayaram , Steve Seitz , Ira Kemelmacher-Shlizerman

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Synthesis > Speech Enhancement

Keywords

source localization speech separation source separation binary search speaker localization waveform domain angular window

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020