Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification

Siqi Zheng; Gang Liu; Hongbin Suo; Yun Lei

2019 INTERSPEECH INTERSPEECH 2019

Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification

Abstract

This study aims to improve the performance of speaker verification system when no labeled out-of-domain data is available. An autoencoder-based semi-supervised curriculum learning scheme is proposed to automatically cluster unlabeled data and iteratively update the corpus during training. This new training scheme allows us to (1) progressively expand the size of training corpus by utilizing unlabeled data and correcting previous labels at run-time; and (2) improve robustness when generalizing to multiple conditions, such as out-of-domain and text-independent speaker verification tasks. It is also discovered that a denoising autoencoder can significantly enhance the clustering accuracy when it is trained on carefully-selected subset of speakers. Our experimental results show a relative reduction of 30%–50% in EER compared to the baseline.

🧭 Keyword Pioneer — out-of-domain generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

Authors

Siqi Zheng , Gang Liu , Hongbin Suo , Yun Lei

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Semi-Supervised Learning Speech & Audio > Recognition > Speaker Recognition

Keywords

semi-supervised learning curriculum learning speaker verification denoising autoencoder out-of-domain generalization

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019