Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions

Joshua Jansen van Vüren; Thomas Niesler

2023 INTERSPEECH INTERSPEECH 2023

Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions

Abstract

We present three approaches to improve language modelling of under-resourced code-switched speech. First, we challenge the practice of fine-tuning large pre-trained language models on small datasets. Secondly, we investigate the advantages of sub-word encodings for our multilingual code-switched speech. Thirdly, we propose an architectural innovation to the RNN language model that is specifically designed for code-switched text. We show a clear reduction in absolute word error rate of 0.17% for the adapted LSTM language model compared to M-BERT when employed in n-best rescoring experiments. Further, the LSTM models afford a seven-fold reduction in total number of parameters and reduces runtime during rescoring 100-fold. Contrary to recent research trends, our LSTM models do not outperform the word-level models when using sub-word vocabularies. Finally, the new architectural mechanism applied to the LSTM improves language prediction for a span of several words following a code-switch.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — sub-word encoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio