Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition

Shai Rozenberg; Hagai Aronowitz; Ron Hoory

2020 INTERSPEECH INTERSPEECH 2020

Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition

Abstract

With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing. The x-vector, an embedding approach based on a deep neural network (DNN), is considered the state-of-the-art when proper end-to-end training is not feasible. However, the accuracy significantly decreases when recording conditions (noise, sample rate, etc.) are mismatched, either between the x-vector training data and the target data or between enrollment and test data. We introduce the Siamese x-vector Reconstruction (SVR) for domain adaptation. We reconstruct the embedding of a higher quality signal from a lower quality counterpart using a lean auxiliary Siamese DNN. We evaluate our method on several mismatch scenarios and demonstrate significant improvement over the baseline.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Shai Rozenberg , Hagai Aronowitz , Ron Hoory

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Biometrics Speech & Audio > Recognition > Speaker Recognition Machine Learning > Learning Types > Domain Adaptation

Keywords

domain adaptation speaker recognition generative adversarial network siamese network neural network embedding reconstruction

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020