The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III

Weiqing Wang; Danwei Cai; Jin Wang; Qingjian Lin; Xuyang Wang; Mi Hong; Ming Li

2021 INTERSPEECH INTERSPEECH 2021

The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III

Abstract

This paper describes the systems developed by the DKU-Duke-Lenovo team for the Fearless Steps Challenge Phase III. For the speech activity detection (SAD) task, we employ the U-Net-based model which has not been used for SAD before, observing a DCF of 1.915% on the eval set. For the speaker identification (SID) task, we adopt the ResNet-SE and ECAPA-TDNN model, and we obtain a Top-5 accuracy of 86.21%. For the speaker diarization (SD) task, we employ several different clustering methods. Besides, domain adaptation, system fusion, and Target-Speaker Voice Activity Detection (TS-VAD) significantly improve the SD performance. We obtain a DER of 12.32% on track 2, and the major contribution is from our ResNet-based TS-VAD model. We finally achieve a first-place ranking for SD and SID and a second-place for SAD in the challenge.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

📈 Trend Setter — Video Understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Weiqing Wang , Danwei Cai , Jin Wang , Qingjian Lin , Xuyang Wang , Mi Hong , Ming Li

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Architectures > Neural Networks Computer Vision > Processing > Video Understanding Speech & Audio > Recognition > Speaker Recognition Machine Learning > Learning Types > Domain Adaptation

Keywords

domain adaptation speaker recognition speaker diarization residual network speaker identification speech activity detection neural network target-speaker voice activity detection

Download PDF

Related papers

Energy-Friendly Keyword Spotting System Using Add-Based Convolution 2021

Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information 2021

Using Games to Augment Corpora for Language Recognition and Confusability 2021

A Psychology-Driven Computational Analysis of Political Interviews 2021

The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results 2021