Evaluating Automatic Metrics with Incremental Machine Translation Systems

Guojun Wu; Shay B Cohen; Rico Sennrich

2024 EMNLP EMNLP 2024

Evaluating Automatic Metrics with Incremental Machine Translation Systems

Abstract

AbstractWe introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. Since human A/B testing is commonly used, we assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations. Our study not only confirms several prior findings, such as the advantage of neural metrics over non-neural ones, but also explores the debated issue of how MT quality affects metric reliability—an investigation that smaller datasets in previous research could not sufficiently explore. Overall, our research demonstrates the dataset’s value as a testbed for metric evaluation. We release our code.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — incremental system

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Guojun Wu , Shay B Cohen , Rico Sennrich

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Machine Learning > Learning Types > Evaluation Deep Learning > Optimization & Theory > Evaluation

Keywords

machine translation a/b testing neural metric translation quality automatic metrics metric evaluation automatic metric automatic metric evaluation neural metrics translation direction incremental system

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024