Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation

Xiaoyi Qin; Danwei Cai; Ming Li

2019 INTERSPEECH INTERSPEECH 2019

Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation

Abstract

In this paper, we focus on the far-field end-to-end text-dependent speaker verification task with a small-scale far-field text dependent dataset and a large scale close-talking text independent database for training. First, we show that simulating far-field text independent data from the existing large-scale clean database for data augmentation can reduce the mismatch. Second, using a small far-field text dependent data set to finetune the deep speaker embedding model pre-trained from the simulated far-field as well as original clean text independent data can significantly improve the system performance. Third, in special applications when using the close-talking clean utterances for enrollment and employing the real far-field noisy utterances for testing, adding reverberant noises on the clean enrollment data can further enhance the system performance. We evaluate our methods on AISHELL ASR0009 and AISHELL 2019B-eval databases and achieve an equal error rate (EER) of 5.75% for far-field text-dependent speaker verification under noisy environments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — far-field speaker verification

Authors

Xiaoyi Qin , Danwei Cai , Ming Li

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Speaker Recognition Machine Learning > Learning Types > Transfer Learning

Keywords

transfer learning data augmentation speaker verification end-to-end learning domain mismatch deep speaker embedding far-field speaker verification text-dependent speaker verification enrollment data augmentation

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019