Leveraging Adapter for Parameter-Efficient ASR Encoder

Kyuhong Shim; Jinkyu Lee; Hyunjae Kim

2024 INTERSPEECH INTERSPEECH 2024

Leveraging Adapter for Parameter-Efficient ASR Encoder

Abstract

The expansion of speech models emphasizes the importance of parameter efficiency in practical automatic speech recognition (ASR) systems. Parameter sharing, which reuses the same parameter multiple times, has emerged as a promising solution to reduce storage requirements. However, previous studies have often faced challenges in balancing the number of parameters with performance. In this paper, we propose a novel architecture that effectively reduces the number of parameters while minimizing performance degradation. The key idea is to insert a lightweight adapter module that adjusts the features generated by shared parameters, thereby enhancing the diversity of representations. We introduce a unique adapter module and parameter-sharing configuration tailored for Conformer-based ASR encoders. Experimental results demonstrate that the proposed architecture reduces approximately 50% of parameters and 20% of computations without compromising speech recognition performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — adapter modules

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Kyuhong Shim , Jinkyu Lee , Hyunjae Kim

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Speech & Audio > Recognition > Automatic Speech Recognition Machine Learning > Learning Paradigms > Transfer Learning

Keywords

model architecture automatic speech recognition parameter-efficient learning parameter-efficient transfer learning adapter module conformer architecture

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024