Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation

Partha Pakray; Reddi Krishna; Santanu Pal; Advaitha Vetagiri; Sandeep Dash; Arnab Kumar Maji; Saralin A. Lyngdoh; Lenin Laitonjam; Anupam Jamatia; Koj Sambyo; Ajit Das; Riyanka Manna

2025 EMNLP EMNLP 2025

Findings of WMT 2025 Shared Task on Low-resource Indic Languages Translation

Abstract

AbstractThis study proposes the results of the lowresource Indic language translation task organized in collaboration with the Tenth Conference on Machine Translation (WMT) 2025. In this workshop, participants were required to build and develop machine translation models for the seven language pairs, which were categorized into two categories. Category 1 is moderate training data available in languages i.e English–Assamese, English–Mizo, English-Khasi, English–Manipuri and English– Nyishi. Category 2 has very limited training data available in languages, i.e English–Bodo and English–Kokborok. This task leverages the enriched IndicNE-corp1.0 dataset, which consists of an extensive collection of parallel and monilingual corpora for north eastern Indic languages. The participant results were evaluated using automatic machine translation metrics, including BLEU, TER, ROUGE-L, ChrF, and METEOR. Along with those metrics, this year’s work also includes Cosine similarity for evaluation, which captures the semantic representation of the sentence to measure the performance and accuracy of the models. This work aims to promote innovation and advancements in low-resource Indic languages.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Partha Pakray , Reddi Krishna , Santanu Pal , Advaitha Vetagiri , Sandeep Dash , Arnab Kumar Maji , Saralin A. Lyngdoh , Lenin Laitonjam , Anupam Jamatia , Koj Sambyo , Ajit Das , Riyanka Manna

Topics

Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Transfer Learning Natural Language Processing > Generation > Machine Translation

Keywords

neural machine translation parallel corpus low-resource language neural metric indic language

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025