Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation

Nan Chen; Yonghe Wang; Feilong Bao

2024 INTERSPEECH INTERSPEECH 2024

Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation

Abstract

Multi-task learning (MTL) approach leverages pre-trained models in speech and machine translation and has significantly advanced speech-to-text translation tasks. However, it introduces a considerable number of parameters, leading to increasing training costs. Most parameter-efficient fine-tuning (PEFT) methods only train additional modules to effectively reduce the number of trainable parameters. Nevertheless, the increase in trainable parameters caused by the PEFT method remains non-negligible in multilingual speech translation settings. In this paper, we first propose the parameter-sharing adapter, which reduces parameters by 7/8 compared to regular adapters, with only approximately 0.7% performance decrease. For the balance between model parameter quantity and performance, we present a neural network search (NAS) based model. Experimental results revealed that the performance of adapter is closest to fine-tuning, while LoRA exhibits the poorest performance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Nan Chen , Yonghe Wang , Feilong Bao

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Efficient Computing

Keywords

neural architecture search parameter sharing pre-trained model parameter-efficient fine-tuning speech translation

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024