2024
INTERSPEECH
INTERSPEECH 2024
What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR
Abstract
The advancement of self-supervised learning has enabled the rapid development of highly accurate speech recognition models, such as wav2vec 2.0, for many languages. While high-resourced languages like English benefit from purely monolingual models, other, less-resourced ones must build upon multilingual foundations. In this work, we investigate various strategies to specialize models for the colloquial Finnish language and demonstrate that continued pre-training of available multilingual models is the best solution. Furthermore, we investigate the success of the pre-training procedure by examining the learned quantized representations and show how the continued pre-training improved the discovered latent codeword groups.
❓
The Questioner
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Speech & Audio
🧭
Keyword Pioneer
— quantized representation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio