Robust Speaker Extraction Network Based on Iterative Refined Adaptation

Chengyun Deng; Shiqian Ma; Yongtao Sha; Yi Zhang; Hui Zhang; Hui Song; Fei Wang

2021 INTERSPEECH INTERSPEECH 2021

Robust Speaker Extraction Network Based on Iterative Refined Adaptation

Abstract

Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given a reference speech from target speaker. Most speaker extraction systems achieve satisfactory performance in the closed condition. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference speech. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker as the feedback of the auxiliary network to refine the speaker embedding, which provides more accurate guidance for the extraction network. Experiments show that the network with IRA confirm the superior performance over comparison approaches in terms of SI-SDRi and PESQ on WSJ0-2mix-extr and WHAM! dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — performance degradation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Chengyun Deng , Shiqian Ma , Yongtao Sha , Yi Zhang , Hui Zhang , Hui Song , Fei Wang

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Speech & Audio > Analysis > Speaker Verification

Keywords

speaker embedding performance degradation speaker extraction iterative refined adaptation multi-talker environment neural network

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021