The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

Danwei Cai; Xiaoyi Qin; Weicheng Cai; Ming Li

2019 INTERSPEECH INTERSPEECH 2019

The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

Abstract

In this paper, we present the DKU system for the speaker recognition task of the VOiCES from a distance challenge 2019. We investigate the whole system pipeline for the far-field speaker verification, including data pre-processing, short-term spectral feature representation, utterance-level speaker modeling, backend scoring, and score normalization. Our best single system employs a residual neural network trained with angular softmax loss. Also, the weighted prediction error algorithms can further improve performance. It achieves 0.3668 minDCF and 5.58% EER on the evaluation set by using a simple cosine similarity scoring. Finally, the submitted primary system obtains 0.3532 minDCF and 4.96% EER on the evaluation set.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio

🧭 Keyword Pioneer — min dcf

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Danwei Cai , Xiaoyi Qin , Weicheng Cai , Ming Li

Topics

Deep Learning > Architectures > Neural Networks Speech & Audio > Analysis > Speaker Verification

Keywords

speaker verification cosine similarity equal error rate residual neural network angular softmax min dcf

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019