Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Haoran Xu; Weiting Tan; Shuyue Li; Yunmo Chen; Benjamin Van Durme; Philipp Koehn; Kenton Murray

2023 EMNLP EMNLP 2023

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Abstract

AbstractIncorporating language-specific (LS) modules or Mixture-of-Experts (MoE) are proven methods to boost performance in multilingual model performance, but the scalability of these approaches to hundreds of languages or experts tends to be hard to manage. We present Language-specific Matrix Synthesis (LMS), a novel method that addresses the issue. LMS utilizes parameter-efficient and lightweight modules, reducing the number of parameters while outperforming existing methods, e.g., +1.73 BLEU over Switch Transformer on OPUS-100 multilingual translation. Additionally, we introduce Fuse Distillation (FD) to condense multilingual knowledge from multiple LS modules into a single shared module, improving model inference and storage efficiency. Our approach demonstrates superior scalability and performance compared to state-of-the-art methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — matrix synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoran Xu , Weiting Tan , Shuyue Li , Yunmo Chen , Benjamin Van Durme , Philipp Koehn , Kenton Murray

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Applications > Machine Translation Machine Learning > Application Areas > Model Compression Deep Learning > Learning Types > Knowledge Distillation

Keywords

model compression knowledge distillation multilingual translation parameter efficient mixture of expert language-specific module matrix synthesis

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023