Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

Guillaume Wenzek; Vishrav Chaudhary; Angela Fan; Sahir Gomez; Naman Goyal; Somya Jain; Douwe Kiela; Tristan Thrush; Francisco Guzmán

2021 EMNLP EMNLP 2021

Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

Abstract

AbstractWe present the results of the first task on Large-Scale Multilingual Machine Translation. The task consists on the many-to-many evaluation of a single model across a variety of source and target languages. This year, the task consisted on three different settings: (i) SMALL-TASK1 (Central/South-Eastern European Languages), (ii) the SMALL-TASK2 (South-East Asian Languages), and (iii) FULL-TASK (all 101 x 100 language pairs). All the tasks used the FLORES-101 dataset as the evaluation benchmark. To ensure the longevity of the dataset, the test sets were not publicly released and the models were evaluated in a controlled environment on Dynabench. There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models. This year’s result show a significant improvement over the known base-lines with +17.8 BLEU for SMALL-TASK2, +10.6 for FULL-TASK and +3.6 for SMALL-TASK1.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — large-scale multilingual model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Guillaume Wenzek , Vishrav Chaudhary , Angela Fan , Sahir Gomez , Naman Goyal , Somya Jain , Douwe Kiela , Tristan Thrush , Francisco Guzmán

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Multilingual NLP Deep Learning > Models > Transformers Machine Learning > Learning Types > Multi-Lingual Learning

Keywords

multilingual machine translation low-resource language bleu score many-to-many translation large-scale multilingual model flores-101 dataset large-scale translation

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021