Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Diptesh Kanojia; Marina Fomicheva; Tharindu Ranasinghe; Frédéric Blain; Constantin Orasan; Lucia Specia

2021 EMNLP EMNLP 2021

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Abstract

AbstractCurrent Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — meaning error detection

🐣 Hot Topic Early Bird — error detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Diptesh Kanojia , Marina Fomicheva , Tharindu Ranasinghe , Frédéric Blain , Constantin Orasan , Lucia Specia

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Learning Types > Adversarial Learning Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Evaluation Artificial Intelligence > Core AI > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning Natural Language Processing > Applications > Quality Estimation

Keywords

machine translation neural machine translation evaluation methodology quality estimation error detection adversarial testing adversarial evaluation meaning preservation fluent translation meaning error detection

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021