2019 INTERSPEECH INTERSPEECH 2019

Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019

Abstract

This paper describes Intel’s speaker recognition systems for the VOiCES from a Distance Challenge 2019. Our submission consists of a Resnet50, and four Xvector systems trained with different data augmentation and input features. Our novel contributions include the use of additive margin softmax loss function and the use of invariant representation learning for some of our systems. To our knowledge, this has not been proposed for speaker recognition. We found that such complementary subsystems greatly improved the performance on the development set by late fusion on score level based on linear logistic regression. After fusion our system achieved on the development set EER, minDCF and actDCF of 2.2%, 0.27 and 0.27; and on the evaluation set 6.08%, 0.451 and 0.458, respectively. We discuss our results and give some insight on accuracy with respect to recording distance.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — additive margin softmax
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio