Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Shaked Dovrat; Eliya Nachmani; Lior Wolf

2021 INTERSPEECH INTERSPEECH 2021

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Abstract

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Training (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an O(C3) time complexity, where C is the number of speakers, in comparison to O(C!) of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to 20 speakers and improves the previous results for large C by a wide margin.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — single channel

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shaked Dovrat , Eliya Nachmani , Lior Wolf

Topics

Machine Learning > Optimization & Theory > Optimization Speech & Audio > Synthesis > Speech Enhancement

Keywords

speech separation hungarian algorithm permutation invariant training neural network single channel

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021