2016 INTERSPEECH INTERSPEECH 2016

Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data

Abstract

Recently, deep neural networks (DNNs) trained to predict senones have been incorporated into the conventional i-vector based speaker verification systems to provide soft frame alignments and show promising results. However, the data mismatch problem may degrade the performance since the DNN requires transcribed data (out-domain data) while the data sets (in-domain data) used for i-vector training and extraction are mostly untranscribed. In this paper, we try to address this problem by exploiting the unlabeled in-domain data during the training of the DNN, hoping the DNN can provide a more robust basis for the in-domain data. In this work, we first explore the impact of using in-domain data during the unsupervised DNN pre-training process. In addition, we decode the in-domain data using a hybrid DNN-HMM system to get its transcription, and then we retrain the DNN model with the โ€œlabeledโ€ in-domain data. Experimental results on the NIST SRE 2008 and the NIST SRE 2010 databases demonstrate the effectiveness of the proposed methods.

๐Ÿš€ Conference Pioneer โ€” INTERSPEECH 2016
๐ŸŒ‰ Interdisciplinary Bridge โ€” Artificial Intelligence and Deep Learning and Machine Learning
๐Ÿ“ˆ Trend Setter โ€” Pretraining
๐Ÿงญ Keyword Pioneer โ€” unsupervised pre-training
๐Ÿฃ Hot Topic Early Bird โ€” semi-supervised learning
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio