2021
EMNLP
EMNLP 2021
Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics
Abstract
AbstractMany modern machine translation evaluation metrics like BERTScore, BLEURT, COMET, MonoTransquest or XMoverScore are based on black-box language models. Hence, it is difficult to explain why these metrics return certain scores. This yearβs Eval4NLP shared task tackles this challenge by searching for methods that can extract feature importance scores that correlate well with human word-level error annotations. In this paper we show that unsupervised metrics that are based on tokenmatching can intrinsically provide such scores. The submitted system interprets the similarities of the contextualized word-embeddings that are used to compute (X)BERTScore as word-level importance scores.
π
Interdisciplinary Bridge
β Artificial Intelligence and Machine Learning and Natural Language Processing
π§
Keyword Pioneer
β token-matching metric
π£
Hot Topic Early Bird
β translation evaluation
π
Cross-Pollinator
β Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Interpretability
Machine Learning > Learning Types > Unsupervised Learning
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Applications > Text Classification
Machine Learning > Core Methods > Feature Selection
Machine Learning > Learning Types > Representation Learning
Natural Language Processing > Applications > Quality Estimation