The HCCL System for the NIST SRE21

Zhuo Li; Runqiu Xiao; Hangting Chen; Zhenduo Zhao; Zihan Zhang; Wenchao Wang

2022 INTERSPEECH INTERSPEECH 2022

The HCCL System for the NIST SRE21

Abstract

This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21). We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce several techniques to reduce the cross-domain mismatch. Specifically, Codec and speech enhancement are directly applied to the raw speech to eliminate the codecs and the environment noise mismatch. We denote these methods that work directly on raw audio to eliminate the relatively explicit mismatch collectively as data adaptation methods. Experiments show that data adaption methods achieve 15\% improvements over our baseline. Furthermore, some popular back-ends domain adaptation algorithms are deployed on speaker embeddings to alleviate speaker performance degradation caused by the implicit mismatch. Score calibration is a major failure for us in SRE21. The reason is that score calibration with excessive parameters easily leads to overfitting.

🧭 Keyword Pioneer — data adaptation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio

Authors

Zhuo Li , Runqiu Xiao , Hangting Chen , Zhenduo Zhao , Zihan Zhang , Wenchao Wang

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Speaker Recognition Machine Learning > Learning Types > Domain Adaptation Deep Learning > Learning Types > Representation Learning

Keywords

domain adaptation speech enhancement speaker embedding speaker recognition score calibration data adaptation circle loss

Download PDF

Related papers

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis 2022

Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset 2022

Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task 2022

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications 2022

Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction 2022