Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

Vilém Zouhar; Shuoyang Ding; Anna Currey; Tatyana Badeka; Jenyuan Wang; Brian Thompson

2024 ACL ACL 2024

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

Abstract

AbstractWe introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain. We use this dataset to investigate whether machine translation (MT) metrics which are fine-tuned on human-generated MT quality judgements are robust to domain shifts between training and inference. We find that fine-tuned metrics exhibit a substantial performance drop in the unseen domain scenario relative to both metrics that rely on the surface form and pre-trained metrics that are not fine-tuned on MT quality judgments.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vilém Zouhar , Shuoyang Ding , Anna Currey , Tatyana Badeka , Jenyuan Wang , Brian Thompson

Topics

Machine Learning > Application Areas > Domain Generalization Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Domain Adaptation Machine Learning > Learning Types > Evaluation

Keywords

domain adaptation machine translation biomedical domain domain shift multidimensional quality metrics quality metrics

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024