2024 INTERSPEECH INTERSPEECH 2024

Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models

Abstract

Adapter tuning is an approach to fine-tune large neural network models on new tasks. These methods can be used to efficiently fine-tune large self-supervised learning (SSL) models for speech recognition tasks. In this work, we aim to perform improved low-resource adaptation of SSL features from source to target domain. Toward this, we experiment with adapter pre-training for Wav2Vec2-based models over different source and target configurations. We experiment over 3 datasets consisting 14 languages, including very low-resource languages. Further, we show the consistency of this method across different adapter dimensions and analyse the feature transformation due to the adapter pre-training process. With the proposed methods, we obtain over 10%-30% relative improvement in WER and CER with Viterbi decoding in 13 languages. Further, we obtain consistent performance gains using LM decoding on many of these languages.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio