2024 INTERSPEECH INTERSPEECH 2024

Variable Segment Length and Domain-Adapted Feature Optimization for Speaker Diarization

Abstract

In speaker diarization, a suitable segment length is still a challenge. Long segments may contain multiple speakers, leading to unreliable embeddings, while short segments may lack sufficient information. We propose an approach of variable segment length using a mixed segment recognition (MSR) network to address this. The MSR module distinguishes between segments with multiple speakers and those with a single speaker. Identified mixed segments are re-cut until pure or reaching the minimum length. In addition, we propose a scheme of domain-adapted feature optimization to fine-tune the pre-trained speaker embedding extractor, where both a specific data augmentation and a distance loss function are used to improve embeddings of the remaining segments still with speaker alternation and overlap. The results demonstrate the effectiveness of our method. It achieves a relative improvement of 25.5% in diarization error rate over the baseline and surpasses the recent state-of-the-art methods.

🧭 Keyword Pioneer — variable segment length
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio