2024 INTERSPEECH INTERSPEECH 2024

Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition

Abstract

Parameter-efficient fine-tuning (PEFT) methods, which train only a part of a model, yield efficient and effective models. Bottleneck approaches, such as adapters and low-rank adaptation (LoRA), have been found to be beneficial in numerous studies and are widely utilized. In this work, we propose and investigate an enhanced PEFT method that adds convolution to linear projection-based bottleneck approaches. We experiment with HuBERT, a representative speech model pre-trained with self-supervised learning, and fine-tune it for the automatic speech recognition (ASR) task to examine how the proposed PEFT method impacts training and inference. We demonstrate consistent performance improvements with a minimal increase in parameters and computational complexity.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — convolution augmentation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio