2023 INTERSPEECH INTERSPEECH 2023

On the robustness of wav2vec 2.0 based speaker recognition systems

Abstract

Recent advances in unsupervised speech representation learning discover new approaches and provide new state-of-the-art for diverse types of speech processing tasks. This paper extends the investigation of using wav2vec 2.0 deep speech representations for the speaker recognition task. It focuses on the robustness issues in different domains and considers the effectiveness of wav2vec not only on telephone and microphone speaker verification protocols but also for cross-channel task. It is concluded that powerful transformer-based speaker recognition systems can be well-generalized across variable conditions. It is concluded that powerful transformer-based speaker recognition systems can be well-generalized across variable conditions. In this study speaker recognition systems were analyzed on a wide range of well-known verification protocols. According to the results obtained in this paper we recommend to use data augmentation for fine-tuning of wav2vec based systems.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio