2017 INTERSPEECH INTERSPEECH 2017

Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering

Abstract

This work presents a new strategy to perform diarization dealing with high variability data, such as multimedia information in broadcast. This variability is highly noticeable among domains (inter-domain variability among chapters, shows, genres, etc.). Therefore, each domain requires its own specific model to obtain the optimal results. We propose to adapt the PLDA models of our diarization system with in-domain unlabeled data. To do it, we estimate pseudo-speaker labels by unsupervised speaker clustering. This new method has been included in a PLDA-based diarization system and evaluated on the Multi-Genre Broadcast 2015 Challenge data. Given an audio, the system computes short-time i-vectors and clusters them using a variational Bayesian PLDA model with hidden labels. The proposed method improves 25.41% relative w.r.t. the system without PLDA adaptation.

🐣 Hot Topic Early Bird — speaker diarization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio