Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification

Pierre-Michel Bousquet; Mickael Rouvier

2017 INTERSPEECH INTERSPEECH 2017

Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification

Abstract

Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10% relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — domain mismatch

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Pierre-Michel Bousquet , Mickael Rouvier

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition Deep Learning > Learning Types > Representation Learning

Keywords

speaker verification deep neural network domain mismatch probabilistic linear discriminant analysis duration mismatch duration mismatch compensation i-vector mapping four-covariance model plda modeling

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017