Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering

Ignacio Viñals; Alfonso Ortega; Jesus Villalba; Antonio Miguel; Eduardo Lleida

2017 INTERSPEECH INTERSPEECH 2017

Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering

Abstract

This work presents a new strategy to perform diarization dealing with high variability data, such as multimedia information in broadcast. This variability is highly noticeable among domains (inter-domain variability among chapters, shows, genres, etc.). Therefore, each domain requires its own specific model to obtain the optimal results. We propose to adapt the PLDA models of our diarization system with in-domain unlabeled data. To do it, we estimate pseudo-speaker labels by unsupervised speaker clustering. This new method has been included in a PLDA-based diarization system and evaluated on the Multi-Genre Broadcast 2015 Challenge data. Given an audio, the system computes short-time i-vectors and clusters them using a variational Bayesian PLDA model with hidden labels. The proposed method improves 25.41% relative w.r.t. the system without PLDA adaptation.

🐣 Hot Topic Early Bird — speaker diarization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ignacio Viñals , Alfonso Ortega , Jesus Villalba , Antonio Miguel , Eduardo Lleida

Topics

Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Application Areas > Domain Adaptation

Keywords

domain adaptation speaker diarization probabilistic linear discriminant analysis speaker clustering

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017