Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification

Qiongqiong Wang; Takafumi Koshinaka

2017 INTERSPEECH INTERSPEECH 2017

Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification

Abstract

This paper presents, for the first time, unsupervised discriminative training of probabilistic linear discriminant analysis (unsupervised DT-PLDA). While discriminative training avoids the problem of generative training based on probabilistic model assumptions that often do not agree with actual data, it has been difficult to apply it to unsupervised scenarios because it can fit data with almost any labels. This paper focuses on unsupervised training of DT-PLDA in the application of domain adaptation in i-vector based speaker verification systems, using unlabeled in-domain data. The proposed method makes it possible to conduct discriminative training, i.e., estimation of model parameters and unknown labels, by employing data statistics as a regularization term in addition to the original objective function in DT-PLDA. An experiment on a NIST Speaker Recognition Evaluation task shows that the proposed method outperforms a conventional method using speaker clustering and performs almost as well as supervised DT-PLDA.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qiongqiong Wang , Takafumi Koshinaka

Topics

Machine Learning > Application Areas > Domain Adaptation

Keywords

unsupervised learning domain adaptation speaker verification probabilistic linear discriminant analysis

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017