2024 INTERSPEECH INTERSPEECH 2024

What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR

Abstract

The advancement of self-supervised learning has enabled the rapid development of highly accurate speech recognition models, such as wav2vec 2.0, for many languages. While high-resourced languages like English benefit from purely monolingual models, other, less-resourced ones must build upon multilingual foundations. In this work, we investigate various strategies to specialize models for the colloquial Finnish language and demonstrate that continued pre-training of available multilingual models is the best solution. Furthermore, we investigate the success of the pre-training procedure by examining the learned quantized representations and show how the continued pre-training improved the discovered latent codeword groups.

The Questioner
🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — quantized representation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio