Efficient Inference for Multilingual Neural Machine Translation

Alexandre Berard; Dain Lee; Stéphane Clinchant; Kweonwoo Jung; Vassilina Nikoulina

2021 EMNLP EMNLP 2021

Efficient Inference for Multilingual Neural Machine Translation

Abstract

AbstractMultilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several “light decoder” architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to almost 2 times faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — vocabulary filtering

🐣 Hot Topic Early Bird — inference efficiency

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexandre Berard , Dain Lee , Stéphane Clinchant , Kweonwoo Jung , Vassilina Nikoulina

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Machine Translation Machine Learning > Application Areas > Model Compression Deep Learning > Optimization & Theory > Efficient Computing

Keywords

model compression inference efficiency inference speed multilingual neural machine translation shallow decoder decoder architecture vocabulary filtering light decoder

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021