Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics

Christoph Wolfgang Leiter

2021 EMNLP EMNLP 2021

Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics

Abstract

AbstractMany modern machine translation evaluation metrics like BERTScore, BLEURT, COMET, MonoTransquest or XMoverScore are based on black-box language models. Hence, it is difficult to explain why these metrics return certain scores. This year’s Eval4NLP shared task tackles this challenge by searching for methods that can extract feature importance scores that correlate well with human word-level error annotations. In this paper we show that unsupervised metrics that are based on tokenmatching can intrinsically provide such scores. The submitted system interprets the similarities of the contextualized word-embeddings that are used to compute (X)BERTScore as word-level importance scores.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — token-matching metric

🐣 Hot Topic Early Bird — translation evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Christoph Wolfgang Leiter

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Applications > Machine Translation Natural Language Processing > Applications > Text Classification Machine Learning > Core Methods > Feature Selection Machine Learning > Learning Types > Representation Learning Natural Language Processing > Applications > Quality Estimation

Keywords

machine translation quality estimation word embedding contextualized embedding translation evaluation token matching token-matching metric word embedding similarity feature importance score unsupervised metric word-level importance

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021