2020 INTERSPEECH INTERSPEECH 2020

Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2

Abstract

This paper describes the ASRGroup team speaker diarization systems submitted to the TRACK 2 of the Fearless Steps Challenge Phase-2. In this system, the similarity matrix among all segments of an audio recording was measured by Sequential Bidirectional Long Short-term Memory Networks (Bi-LSTM), and a clustering scheme based on Density Peak Cluster Algorithm (DPCA) was proposed to clustering the segments. The system was compared with the Kaldi Toolkit diarization system (x-vector based on TDNN with PLDA scoring model) and the Spectral system (similarity based on Bi-LSTM with Spectral clustering algorithm). Experiments show that our system is significantly outperforms above systems and achieves a Diarization Error Rate (DER) of 42.75% and 39.52% respectively on the Dev dataset and Eval dataset of TRACK 2 (Fearless Steps Challenge Phase-2). Compared with the baseline Kaldi Toolkit diarization system and Spectral Clustering algorithm with Bi-LSTM similarity models, the DER of our system is absolutely reduced 4.64%, 1.84% and 8.85%, 7.57% respectively on the two datasets.

🧭 Keyword Pioneer — audio segmentation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio