2019 INTERSPEECH INTERSPEECH 2019

Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Abstract

Many modern systems for speaker diarization, such as the top-performing JHU system in the DIHARD 2018 challenge, rely on clustering of DNN speaker embeddings followed by HMM resegmentation. Two problems with this approach are that parameters need significant retuning for different applications, and that the DNN contributes only to the clustering task and not the resegmentation. This paper presents two contributions: an improved HMM segment assignment algorithm using leave-one-out Gaussian PLDA scoring, and an approach to training the DNN such that embeddings directly optimize performance of this scoring method with generatively updated PLDA parameters. Initial experiments with this new system are very promising, achieving state-of-the-art performance for two separate tasks (Callhome and DIHARD18) without any task-dependent parameter tuning.

🧭 Keyword Pioneer — gaussian plda
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio