CUNI and Phrase at WMT25 MT Evaluation Task

Miroslav Hrabal; Ondrej Glembek; Aleš Tamchyna; Almut Silja Hildebrand; Alan Eckhard; Miroslav Štola; Sergio Penkale; Zuzana Šimečková; Ondřej Bojar; Alon Lavie; Craig Stewart

2025 EMNLP EMNLP 2025

CUNI and Phrase at WMT25 MT Evaluation Task

Abstract

AbstractThis paper describes the joint effort of Phrase a.s. and Charles University’sInstitute of Formal and Applied Linguistics (CUNI/UFAL) on the WMT25Automated Translation Quality Evaluation Systems Shared Task. Both teamsparticipated both in a collaborative and competitive manner, i.e. they eachsubmitted a system of their own as well as a contrastive joint system ensemble.In Task~1, we show that such an ensembling—if chosen in a clever way—canlead to a performance boost. We present the analysis of various kinds ofsystems comprising both “traditional” NN-based approach, as well as differentflavours of LLMs—off-the-shelf commercial models, their fine-tuned versions,but also in-house, custom-trained alternative models. In Tasks~2 and~3 we showPhrase’s approach to tackling the tasks via various GPT models: Error SpanAnnotation via the complete MQM solution using non-reasoning models (includingfine-tuned versions) in Task~2, and using reasoning models in Task~3.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Miroslav Hrabal , Ondrej Glembek , Aleš Tamchyna , Almut Silja Hildebrand , Alan Eckhard , Miroslav Štola , Sergio Penkale , Zuzana Šimečková , Ondřej Bojar , Alon Lavie , Craig Stewart

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Multi-Task Learning Deep Learning > Models > Large Language Models

Keywords

ensemble learning machine translation ensemble method translation evaluation large language model neural network

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025