Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers

Marvin Borsdorf; Chenglin Xu; Haizhou Li; Tanja Schultz

2021 INTERSPEECH INTERSPEECH 2021

Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers

Abstract

Speaker extraction has been studied mostly for the scenarios where a target speaker is present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday conversations. For example, a target speaker can be the only active talker, be quiet for a while, or leave the conversation, that means the target speaker is absent from the mixture. Traditional speaker extraction models fail in these scenarios. We propose a novel speaker extraction approach to handle speech mixtures with one or two talkers in which the target speaker can either be present or absent. First, we formulate four speaker extraction conditions to cover the typical scenarios of everyday conversations with one and two talkers. Second, we introduce a joint training scheme with one unified loss function that works for all four conditions. We show that only a small amount of data is required to adapt the model to work well in the four conditions.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — target speaker absence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Marvin Borsdorf , Chenglin Xu , Haizhou Li , Tanja Schultz

Topics

Machine Learning > Core Methods > Representation Learning Speech & Audio > Synthesis > Speech Enhancement Speech & Audio > Processing > Speech Enhancement

Keywords

speech separation joint training speaker extraction speech mixture multi-talker speech target speaker target speaker absence unified loss function single-talker speech universal extraction

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021