Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Marjan Ghazvininejad; Omer Levy; Yinhan Liu; Luke Zettlemoyer

2019 EMNLP EMNLP 2019

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Abstract

AbstractMost machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

🐣 Hot Topic Early Bird — masked language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marjan Ghazvininejad , Omer Levy , Yinhan Liu , Luke Zettlemoyer

Topics

Natural Language Processing > Generation > Text Generation Natural Language Processing > Applications > Machine Translation

Keywords

masked language model conditional generation parallel decoding iterative decoding non-autoregressive translation

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019