Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification

Umair Khan; Miquel India; Javier Hernando

2019 INTERSPEECH INTERSPEECH 2019

Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification

Abstract

In the last years, i-vectors followed by cosine or PLDA scoring techniques were the state-of-the-art approach in speaker verification. PLDA requires labeled background data, and there exists a significant performance gap between the two scoring techniques. In this work, we propose to reduce this gap by using an autoencoder to transform i-vector into a new speaker vector representation, which will be referred to as ae-vector. The autoencoder will be trained to reconstruct neighbor i-vectors instead of the same training i-vectors, as usual. These neighbor i-vectors will be selected in an unsupervised manner according to the highest cosine scores to the training i-vectors. The evaluation is performed on the speaker verification trials of VoxCeleb-1 database. The experiments show that our proposed ae-vectors gain a relative improvement of 42% in terms of EER compared to the conventional i-vectors using cosine scoring, which fills the performance gap between cosine and PLDA scoring techniques by 92%, but without using speaker labels.

📈 Trend Setter — Foundation Models

🧭 Keyword Pioneer — cosine scoring

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio

Authors

Umair Khan , Miquel India , Javier Hernando

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Core Methods > Embedding Learning Deep Learning > Architectures > Autoencoders Speech & Audio > Recognition > Speaker Recognition

Keywords

representation learning speaker verification cosine scoring neighbor reconstruction

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019