Distilling Multiple Domains for Neural Machine Translation

Anna Currey; Prashant Mathur; Georgiana Dinu

2020 EMNLP EMNLP 2020

Distilling Multiple Domains for Neural Machine Translation

Abstract

AbstractNeural machine translation achieves impressive results in high-resource conditions, but performance often suffers when the input domain is low-resource. The standard practice of adapting a separate model for each domain of interest does not scale well in practice from both a quality perspective (brittleness under domain shift) as well as a cost perspective (added maintenance and inference complexity). In this paper, we propose a framework for training a single multi-domain neural machine translation model that is able to translate several domains without increasing inference time or memory usage. We show that this model can improve translation on both high- and low-resource domains over strong multi-domain baselines. In addition, our proposed model is effective when domain labels are unknown during training, as well as robust under noisy data conditions.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-domain model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anna Currey , Prashant Mathur , Georgiana Dinu

Topics

Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Knowledge Distillation Natural Language Processing > Applications > Machine Translation Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Multi-Task Learning

Keywords

model compression domain adaptation knowledge distillation neural machine translation domain shift multi-domain translation multi-domain model

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020