When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

Archchana Sindhujan; Diptesh Kanojia; Constantin Orasan; Shenbin Qian

2025 COLING COLING 2025

When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

Abstract

AbstractThis paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE). Segment-level QE is a challenging cross-lingual language understanding task that provides a quality score (0 -100) to the translated output. We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios and perform instruction fine-tuning using a novel prompt based on annotation guidelines. Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models. Our error analysis reveals tokenization issues, along with errors due to transliteration and named entities, and argues for refinement in LLM pre-training for cross-lingual tasks. We release the data, and models trained publicly for further research.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Archchana Sindhujan , Diptesh Kanojia , Constantin Orasan , Shenbin Qian

Topics

Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Transfer Learning Artificial Intelligence > Core AI > Large Language Models

Keywords

zero-shot learning machine translation quality estimation low-resource language instruction fine-tuning large language model

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025