Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models

Sathvik Udupa; Jesuraj Bandekar; Saurabh Kumar; Deekshitha G; Sandhya B; Abhayjeet S; Savitha Murthy; Priyanka Pai; Srinivasa Raghavan; Raoul Nanavati; Prasanta Kumar Ghosh

2024 INTERSPEECH INTERSPEECH 2024

Adapter pre-training for improved speech recognition in unseen domains using low resource adapter tuning of self-supervised models

Abstract

Adapter tuning is an approach to fine-tune large neural network models on new tasks. These methods can be used to efficiently fine-tune large self-supervised learning (SSL) models for speech recognition tasks. In this work, we aim to perform improved low-resource adaptation of SSL features from source to target domain. Toward this, we experiment with adapter pre-training for Wav2Vec2-based models over different source and target configurations. We experiment over 3 datasets consisting 14 languages, including very low-resource languages. Further, we show the consistency of this method across different adapter dimensions and analyse the feature transformation due to the adapter pre-training process. With the proposed methods, we obtain over 10%-30% relative improvement in WER and CER with Viterbi decoding in 13 languages. Further, we obtain consistent performance gains using LM decoding on many of these languages.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Sathvik Udupa , Jesuraj Bandekar , Saurabh Kumar , Deekshitha G , Sandhya B , Abhayjeet S , Savitha Murthy , Priyanka Pai , Srinivasa Raghavan , Raoul Nanavati , Prasanta Kumar Ghosh

Topics

Machine Learning > Application Areas > Domain Adaptation Speech & Audio > Recognition > Automatic Speech Recognition

Keywords

domain adaptation self-supervised learning speech recognition adapter tuning

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024