2019 INTERSPEECH INTERSPEECH 2019

Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification

Abstract

In the last years, i-vectors followed by cosine or PLDA scoring techniques were the state-of-the-art approach in speaker verification. PLDA requires labeled background data, and there exists a significant performance gap between the two scoring techniques. In this work, we propose to reduce this gap by using an autoencoder to transform i-vector into a new speaker vector representation, which will be referred to as ae-vector. The autoencoder will be trained to reconstruct neighbor i-vectors instead of the same training i-vectors, as usual. These neighbor i-vectors will be selected in an unsupervised manner according to the highest cosine scores to the training i-vectors. The evaluation is performed on the speaker verification trials of VoxCeleb-1 database. The experiments show that our proposed ae-vectors gain a relative improvement of 42% in terms of EER compared to the conventional i-vectors using cosine scoring, which fills the performance gap between cosine and PLDA scoring techniques by 92%, but without using speaker labels.

📈 Trend Setter — Foundation Models
🧭 Keyword Pioneer — cosine scoring
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio
🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio