COMET-poly: Machine Translation Metric Grounded in Other Candidates

Maike Züfle; Vilém Zouhar; Tu Anh Dinh; Felipe Maia Polo; Jan Niehues; Mrinmaya Sachan

2025 EMNLP EMNLP 2025

COMET-poly: Machine Translation Metric Grounded in Other Candidates

Abstract

AbstractAutomated metrics for machine translation attempt to replicate human judgment. Unlike humans, who often assess a translation in the context of multiple alternatives, these metrics typically consider only the source sentence and a single translation. This discrepancy in the evaluation setup may negatively impact the performance of automated metrics. We propose two automated metrics that incorporate additional information beyond the single translation. COMET-polycand uses alternative translations of the same source sentence to compare and contrast with the translation at hand, thereby providing a more informed assessment of its quality. COMET-polyic, inspired by retrieval-based in-context learning, takes in translations of similar source texts along with their human-labeled quality scores to guide the evaluation. We find that including a single additional translation in COMET-polycand improves the segment-level metric performance (0.079 to 0.118 Kendall’s tau-b correlation), with further gains when more translations are added. Incorporating retrieved examples in COMET-polyic yields similar improvements (0.079 to 0.116 Kendall’s tau-b correlation). We release our models publicly.

🧭 Keyword Pioneer — alternative translation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maike Züfle , Vilém Zouhar , Tu Anh Dinh , Felipe Maia Polo , Jan Niehues , Mrinmaya Sachan

Topics

Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Machine Translation

Keywords

machine translation in-context learning automated metrics retrieval-augmented generation machine translation metric translation evaluation alternative translation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025