Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models

Hang Su; Yuxiang Kong; Lichun Fan; peng gao; Yujun Wang; Zhiyong Wu

2024 INTERSPEECH INTERSPEECH 2024

Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models

Abstract

Speaker Change Detection (SCD) is an essential problem in speech processing and has various applications in many fields. The self-supervised models have shown impressive performance on many downstream tasks in the pre-training and fine-tuning paradigm. However, it has limitations to apply a fine-tuned self-supervised pre-trained model to frame-level SCD task in real industry because it typically requires a smaller model that consumes fewer computational resources. To tackle this issue, we propose using Knowledge Distillation (KD) to leverage the capabilities of the self-supervised model. First, a basic KD method based on the pre-trained model is proposed. Then, a weighted-sum KD method is proposed to selectively extract information from the pre-trained model. Experimental results demonstrate the effectiveness of the basic KD method as well as a further improvement for the weighted-sum KD method. The proposed method is more suitable for industrial applications compared with fine-tuning.

🧭 Keyword Pioneer — weighted-sum distillation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hang Su , Yuxiang Kong , Lichun Fan , peng gao , Yujun Wang , Zhiyong Wu

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Application Areas > Knowledge Distillation

Keywords

model compression knowledge distillation speaker change detection self-supervised pre-trained model weighted-sum distillation

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024