Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation

Nitika Mathur; Timothy Baldwin; Trevor Cohn

2019 ACL ACL 2019

Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation

Abstract

AbstractAccurate, automatic evaluation of machine translation is critical for system tuning, and evaluating progress in the field. We proposed a simple unsupervised metric, and additional supervised metrics which rely on contextual word embeddings to encode the translation and reference sentences. We find that these models rival or surpass all existing metrics in the WMT 2017 sentence-level and system-level tracks, and our trained model has a substantially higher correlation with human judgements than all existing metrics on the WMT 2017 to-English sentence level dataset.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

📈 Trend Setter — Quality Estimation

🧭 Keyword Pioneer — sentence-level translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Nitika Mathur , Timothy Baldwin , Trevor Cohn

Topics

Deep Learning > Architectures > Transformers Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Machine Translation Natural Language Processing > Applications > Quality Estimation

Keywords

bleu score contextual embedding machine translation evaluation sentence-level translation human judgement unsupervised metric human judgement correlation

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019