Attentive Training: A New Training Framework for Talker-independent Speaker Extraction

Ashutosh Pandey; Deliang Wang

2022 INTERSPEECH INTERSPEECH 2022

Attentive Training: A New Training Framework for Talker-independent Speaker Extraction

Abstract

Listening in a multitalker scenario, we typically attend to a single talker through auditory selective attention. Inspired by human selective attention, we propose attentive training: a new training framework for talker-independent speaker extraction with an intrinsic selection mechanism. In the real world, multiple talkers very unlikely start speaking at the same time. Based on this observation, we train a deep neural network to create a representation for the first speaker and utilize it to extract or track that speaker from a multitalker noisy mixture. Experimental results demonstrate the superiority of attentive training over widely used permutation invariant training for talker-independent speaker extraction, especially in mismatched conditions in terms of the number of speakers, speaker interaction patterns, and the amount of speaker overlaps.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🧭 Keyword Pioneer — talker independent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Ashutosh Pandey , Deliang Wang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Neural Networks

Keywords

selective attention speaker extraction auditory attention permutation invariant training talker independent

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022