2023 INTERSPEECH INTERSPEECH 2023

Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification

Abstract

The utilization of Conformer-based architecture has been shown to be effective in improving the performance of spoken language identification (LID) in recent years due to Conformer's superior representational capacity. However, when performing language identification on short speech segments, a significant drop in performance is often observed. In this paper, we propose a novel method to alleviate this issue by introducing a self-knowledge distillation technique to Conformer-based LID architecture. We distill the predictive distribution between the original input and the input processed by a double-ended random masking module during the training stage for each sample. Experimental results demonstrate the effectiveness of the proposed method on two datasets: OLR21 with 16,000 Hz sampling rate and LRE22 with 8,000 Hz sampling rate. Moreover, the proposed method also enhances the performance of language identification on short-duration speech segments.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio