Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

Vladimir Karpukhin; Omer Levy; Jacob Eisenstein; Marjan Ghazvininejad

2019 EMNLP EMNLP 2019

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

Abstract

AbstractContemporary machine translation systems achieve greater coverage by applying subword models such as BPE and character-level CNNs, but these methods are highly sensitive to orthographical variations such as spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text. We focus on translation performance on natural typos, and show that robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — noise training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Vladimir Karpukhin , Omer Levy , Jacob Eisenstein , Marjan Ghazvininejad

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Machine Translation

Keywords

neural machine translation synthetic noise noise training

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019