2021 INTERSPEECH INTERSPEECH 2021

An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model

Abstract

In this paper, we propose an end-to-end (E2E) dialect identification system trained using transfer learning from a multilingual automatic speech recognition (ASR) model. This is also an extension of our submitted system to the Oriental Language Recognition Challenge 2020 (AP20-OLR). We verified its applicability using the dialect identification (DID) task of the AP20-OLR. First, we trained a robust conformer-based joint connectionist temporal classification (CTC) /attention multilingual E2E ASR model using the training corpora of eight languages, independent of the target dialects. Second, we initialized the E2E-based classifier with the ASR model’s shared encoder using a transfer learning approach. Finally, we trained the classifier on the target dialect corpus. We obtained the final classifier by selecting the best model from the following: (1) the averaged model in term of the loss values; and (2) the averaged model in term of classification accuracy. Our experiments on the DID test-set of the AP20-OLR demonstrated that significant identification improvements were achieved for three Chinese dialects. The performances of our system outperforms the winning team of the AP20-OLR, with the largest relative reductions of 19.5% in Cavg and 25.2% in EER.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio