2020 INTERSPEECH INTERSPEECH 2020

STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020

Abstract

This paper presents speaker recognition (SR) systems submitted by the Speech Technology Center (STC) team to the Far-Field Speaker Verification Challenge 2020. SR tasks of the challenge are focused on the problem of far-field text-dependent speaker verification from single microphone array (Track 1), far-field text-independent speaker verification from single microphone array (Track 2) and far-field text-dependent speaker verification from distributed microphone arrays (Track 3). In this paper, we present techniques and ideas underlying our best performing models. A number of experiments on x-vector-based and ResNet-like architectures show that ResNet-based networks outperform x-vector-based systems. Submitted systems are the fusions of ResNet34-based extractors, trained on 80 Log Mel-filter bank energies (MFBs) post-processed with U-net-like voice activity detector (VAD). The best systems for the Track 1, Track 2 and Track 3 achieved 5.08% EER and 0.500 Cmindet, 5.39% EER and 0.541 Cmindet and 5.53% EER and 0.458 Cmindet on the challenge evaluation sets respectively.

🌉 Interdisciplinary Bridge — Deep Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio