Target Speaker Extraction for Multi-Talker Speaker Verification

Wei Rao; Chenglin Xu; Eng Siong Chng; Haizhou Li

2019 INTERSPEECH INTERSPEECH 2019

Target Speaker Extraction for Multi-Talker Speaker Verification

Abstract

The performance of speaker verification degrades significantly when the test speech is corrupted by interference from non-target speakers. Speaker diarization separates speakers well only if the speakers are not overlapped. However, if multiple talkers speak at the same time, we need a technique to separate the speech in the spectral domain. In this paper, we study a way to extract the target speaker’s speech from an overlapped multi-talker speech. Specifically, given some reference speech samples from the target speaker, the target speaker’s speech is firstly extracted from the overlapped multi-talker speech, then the extracted speech is processed in the speaker verification system. Experimental results show that the proposed approach significantly improves the performance of overlapped multi-talker speaker verification and achieves 64.4% relative EER reduction over the zero-effort baseline.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Wei Rao , Chenglin Xu , Eng Siong Chng , Haizhou Li

Topics

Speech & Audio > Recognition > Speaker Recognition Speech & Audio > Analysis > Speaker Verification

Keywords

speaker verification speaker diarization speaker extraction overlapped speech multi-talker speech target speaker

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019