Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

Lu Yin; Ziteng Wang; Risheng Xia; Junfeng Li; Yonghong Yan

2018 INTERSPEECH INTERSPEECH 2018

Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

Abstract

The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi-talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two-speaker and three-speaker mixtures.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — speech separation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Lu Yin , Ziteng Wang , Risheng Xia , Junfeng Li , Yonghong Yan

Topics

Machine Learning > Learning Types > Self-Supervised Learning Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

speech separation mask prediction permutation invariant training multi-talker speech signal-to-distortion ratio

Download PDF

Related papers

HoloCompanion: An MR Friend for EveryOne 2018

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley 2018

Deep Learning Techniques for Koala Activity Detection 2018

An Exploration of Local Speaking Rate Variations in Mandarin Read Speech 2018

Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese 2018