Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition

Van-Hien Tran; Huy Hien Vu; Hideki Tanaka; Masao Utiyama

2026 EACL EACL 2026

Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition

Abstract

AbstractLarge language models (LLMs) often underperform in zero-shot text classification for low-resource, non-Latin languages due to script and tokenization mismatches. We propose representation-aware prompting for Marathi that augments the original script with International Phonetic Alphabet (IPA) transcriptions, romanization, or a repetition-based fallback when external converters are unavailable. Experiments with two instruction-tuned LLMs on Marathi sentiment analysis and hate detection show consistent gains over script-only prompting (up to +2.6 accuracy points). We further find that the most effective augmentation is model-dependent, and that combining all variants is not consistently beneficial, suggesting that concise, targeted cues are preferable in zero-shot settings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Van-Hien Tran , Huy Hien Vu , Hideki Tanaka , Masao Utiyama

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Artificial Intelligence > Learning Paradigms > Zero-Shot Learning

Keywords

zero-shot learning sentiment analysis prompt engineering hate speech detection phonetic transcription

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026