Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts

Chunlan Ma; Yihong Liu; Haotian Ye; Hinrich Schuetze

2025 EMNLP EMNLP 2025

Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts

Abstract

AbstractDecoder-only large language models (LLMs) excel in high-resource languages across various tasks through few-shot or even zero-shot in-context learning (ICL). However, their performance often does not transfer well to low-resource languages, especially those written in non-Latin scripts. Inspired by recent work that leverages transliteration in encoder-only models, we investigate whether transliteration is also effective in improving LLMs’ performance for low-resource languages written in non-Latin scripts. To this end, we propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script transliteration, or (3) combined. We apply these methods to several representative LLMs of different sizes on various tasks including text classification and sequential labeling. Our findings show that the effectiveness of transliteration varies by task type and model size. For instance, all models benefit from transliterations for sequential labeling (with increases of up to 25%%).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chunlan Ma , Yihong Liu , Haotian Ye , Hinrich Schuetze

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

text classification in-context learning low-resource language sequential labeling non-latin script

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025