Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019

Jonathan Huang; Tobias Bocklet

2019 INTERSPEECH INTERSPEECH 2019

Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019

Abstract

This paper describes Intel’s speaker recognition systems for the VOiCES from a Distance Challenge 2019. Our submission consists of a Resnet50, and four Xvector systems trained with different data augmentation and input features. Our novel contributions include the use of additive margin softmax loss function and the use of invariant representation learning for some of our systems. To our knowledge, this has not been proposed for speaker recognition. We found that such complementary subsystems greatly improved the performance on the development set by late fusion on score level based on linear logistic regression. After fusion our system achieved on the development set EER, minDCF and actDCF of 2.2%, 0.27 and 0.27; and on the evaluation set 6.08%, 0.451 and 0.458, respectively. We discuss our results and give some insight on accuracy with respect to recording distance.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — additive margin softmax

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Jonathan Huang , Tobias Bocklet

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Metric Learning Deep Learning > Architectures > Neural Networks Speech & Audio > Recognition > Speaker Recognition

Keywords

representation learning speaker recognition late fusion additive margin softmax

Download PDF

Related papers

Using Real-Time Visual Biofeedback for Second Language Instruction 2019

VAE-Based Regularization for Deep Speaker Embedding 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition 2019

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition 2019

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile 2019