Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation

Kaustubh Shivshankar Shejole; Sourabh Deoghare; Pushpak Bhattacharyya

2026 EACL EACL 2026

Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation

Abstract

AbstractNeural Machine Translation (NMT) systems rely heavily on explicit punctuation cues to resolve semantic ambiguities in a source sentence. Inputting user-generated sentences, which are likely to contain missing or incorrect punctuation, results in fluent but semantically disastrous translations. This work attempts to highlight and address the problem of punctuation robustness of NMT systems through an English-to-Marathi translation. First, we introduce Virām, a human-curated diagnostic benchmark of 54 punctuation-ambiguous English-Marathi sentence pairs to stress-test existing NMT systems. Second, we evaluate two simple remediation strategies: cascade-based restore-then-translate and direct fine-tuning. Our experimental results and analysis demonstrate that both strategies yield substantial NMT performance improvements. Furthermore, we find that current Large Language Models (LLMs) exhibit relatively poorer robustness in translating such sentences than these task-specific strategies, thus necessitating further research in this area. The code and dataset are available at https://github.com/KaustubhShejole/Viram_Marathi.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — punctuation robustness

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Kaustubh Shivshankar Shejole , Sourabh Deoghare , Pushpak Bhattacharyya

Topics

Machine Learning > Application Areas > Domain Generalization Natural Language Processing > Applications > Machine Translation

Keywords

neural machine translation domain robustness punctuation robustness

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026