How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Anushka Singh; Ananya Sai; Raj Dabre; Ratish Puduppully; Anoop Kunchukuttan; Mitesh Khapra

2024 ACL ACL 2024

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Abstract

AbstractWhile machine translation evaluation has been studied primarily for high-resource languages, there has been a recent interest in evaluation for low-resource languages due to the increasing availability of data and models. In this paper, we focus on a zero-shot evaluation setting focusing on low-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi. We collect sufficient Multi-Dimensional Quality Metrics (MQM) and Direct Assessment (DA) annotations to create test sets and meta-evaluate a plethora of automatic evaluation metrics. We observe that even for learned metrics, which are known to exhibit zero-shot performance, the Kendall Tau and Pearson correlations with human annotations are only as high as 0.32 and 0.45. Synthetic data approaches show mixed results and overall do not help close the gap by much for these languages. This indicates that there is still a long way to go for low-resource evaluation.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐣 Hot Topic Early Bird — zero-shot evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Anushka Singh , Ananya Sai , Raj Dabre , Ratish Puduppully , Anoop Kunchukuttan , Mitesh Khapra

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Machine Translation

Keywords

machine translation low-resource language zero-shot evaluation indian language metric correlation direct assessment

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024