Enhancing Urdu Sentiment Classification through Instruction-Tuned LLMs and Cross-Lingual Transfer

Hasan Faraz Khan; Noor Fatima; Irfan Ahmad

2026 EACL EACL 2026

Enhancing Urdu Sentiment Classification through Instruction-Tuned LLMs and Cross-Lingual Transfer

Abstract

AbstractSentiment analysis in low-resource languages such as Urdu poses unique challenges due to limited annotated data, morphological complexity, and significant class imbalance in most publicly available datasets. This study addresses these issues through two experimental strategies. First, we explore class imbalance mitigation by using instruction-tuned large language models (LLMs) to generate synthetic negative sentiment samples in Urdu. This augmentation strategy results in a more balanced dataset, which significantly improves the recall and F1-score for minority class predictions when fine-tuned using a multilingual BERT model. Second, we investigate the effectiveness of translating Urdu text into English and applying sentiment classification through a pre-trained English language model. Comparative evaluation reveals that the translation-based pipeline, using a RoBERTa model fine-tuned for English sentiment classification, achieves superior performance across major metrics. Our results suggest that LLM-based augmentation and cross-lingual transfer via translation both serve as viable approaches to overcome data scarcity and performance limitations in sentiment analysis for low-resource languages. The findings highlight the potential applicability of these approaches to other under-resourced linguistic domains.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hasan Faraz Khan , Noor Fatima , Irfan Ahmad

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Understanding > Sentiment Analysis Natural Language Processing > Applications > Text Classification

Keywords

sentiment analysis data augmentation cross-lingual transfer class imbalance large language model

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026