Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

Xian Li; Changhan Wang; Yun Tang; Chau Tran; Yuqing Tang; Juan Pino; Alexei Baevski; Alexis CONNEAU; Michael Auli

2021 ACL ACL 2021

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

Abstract

AbstractWe present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder. Our key finding is that a minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability by only finetuning 10 50% of the pretrained parameters. This effectively leverages large pretrained models at low training cost such as wav2vec 2.0 for acoustic modeling, and mBART for multilingual text generation. This sets a new state-of-the-art for 36 translation directions (and surpassing cascaded ST for 26 of them) on the large-scale multilingual ST benchmark CoVoST 2 (+6.4 BLEU on average for En-X directions and +6.7 BLEU for X-En directions). Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model (+5.6 BLEU on average across 28 non-English directions), making it an appealing approach for attaining high-quality speech translation with improved parameter and data efficiency.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — efficient finetuning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

🌱 Topic Pioneer — Foundation Models

📈 Trend Setter — Foundation Models

Authors

Xian Li , Changhan Wang , Yun Tang , Chau Tran , Yuqing Tang , Juan Pino , Alexei Baevski , Alexis CONNEAU , Michael Auli

Topics

Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Transfer Learning Deep Learning > Models > Foundation Models Deep Learning > Learning Types > Transfer Learning Speech & Audio > Recognition > Speech Translation

Keywords

zero-shot learning transfer learning parameter efficiency pretrained model multilingual model speech translation efficient finetuning

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021