Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2

Xueshuai Zhang; Wenchao Wang; Pengyuan Zhang

2020 INTERSPEECH INTERSPEECH 2020

Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2

Abstract

This paper describes the ASRGroup team speaker diarization systems submitted to the TRACK 2 of the Fearless Steps Challenge Phase-2. In this system, the similarity matrix among all segments of an audio recording was measured by Sequential Bidirectional Long Short-term Memory Networks (Bi-LSTM), and a clustering scheme based on Density Peak Cluster Algorithm (DPCA) was proposed to clustering the segments. The system was compared with the Kaldi Toolkit diarization system (x-vector based on TDNN with PLDA scoring model) and the Spectral system (similarity based on Bi-LSTM with Spectral clustering algorithm). Experiments show that our system is significantly outperforms above systems and achieves a Diarization Error Rate (DER) of 42.75% and 39.52% respectively on the Dev dataset and Eval dataset of TRACK 2 (Fearless Steps Challenge Phase-2). Compared with the baseline Kaldi Toolkit diarization system and Spectral Clustering algorithm with Bi-LSTM similarity models, the DER of our system is absolutely reduced 4.64%, 1.84% and 8.85%, 7.57% respectively on the two datasets.

🧭 Keyword Pioneer — audio segmentation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xueshuai Zhang , Wenchao Wang , Pengyuan Zhang

Topics

Machine Learning > Core Methods > Clustering Machine Learning > Core Methods > Representation Learning

Keywords

sequence modeling speaker diarization bidirectional long short-term memory audio segmentation density peak clustering

Download PDF

Related papers

Memory Controlled Sequential Self Attention for Sound Recognition 2020

Dual Attention in Time and Frequency Domain for Voice Activity Detection 2020

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer 2020

A Noise Robust Technique for Detecting Vowels in Speech Signals 2020

Joint Detection of Sentence Stress and Phrase Boundary for Prosody 2020