2023 INTERSPEECH INTERSPEECH 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods

Abstract

Speech foundation models, pre-trained on large amounts of unsupervised or supervised audio data, have demonstrated an impressive ability to transfer their learning to specific domains for speech recognition. Parameter-efficient fine-tuning methods offer an efficient paradigm where a small set of parameters are updated to adapt the foundation model to new tasks. However, it is unclear how the intermediate features of the foundation model behave, and how to utilize them in a more efficient way. In this paper, we compare the performance of three speech foundation models for speech recognition. We re-investigate how features from different layers behave and propose a simple and effective feature fusion method for efficient transfer learning. Experimental results demonstrate that the proposed method uses 31.7% fewer trainable encoder parameters, 13.4% less computational memory cost than compared method, and does not compromise quality on the target task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio
🧭 Keyword Pioneer — speech foundation model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio