VAE-Based Regularization for Deep Speaker Embedding

Yang Zhang; Lantian Li; Dong Wang

2019 INTERSPEECH INTERSPEECH 2019

VAE-Based Regularization for Deep Speaker Embedding

Abstract

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called ‘x-vectors’) are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐣 Hot Topic Early Bird — gaussian distribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Zhang , Lantian Li , Dong Wang

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Speech & Audio > Analysis > Speaker Verification

Keywords

speaker embedding gaussian distribution latent space variational auto-encoder

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019

A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews 2019