MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi

Anuj Kumar; Mohammed Faisal Sayed; Satyadev Ahlawat; Yamuna Prasad

2025 EMNLP EMNLP 2025

MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi

Abstract

AbstractMarathi, the third most widely spoken language in India with over 83 million native speakers, remains significantly underrepresented in Natural Language Processing (NLP) research. While sentiment analysis has achieved substantial progress in high-resource languages such as English, Chinese, and Hindi, available Marathi datasets are limited to coarse sentiment labels and lack fine-grained emotional categorization or interpretability through explanations. To address this gap, we present a new annotated dataset of 10,762 Marathi sentences, each labeled with sentiment (positive, negative, or neutral), emotion (joy, anger, surprise, disgust, sadness, fear, or neutral), and a corresponding natural language justification. Justifications are written in English and generated using GPT-4 under a human-in-the-loop framework to ensure label fidelity and contextual alignment. Extensive experiments with both classical and transformer-based models demonstrate the effectiveness of the dataset for interpretable affective computing in a low-resource language setting, offering a benchmark for future research in multilingual and explainable NLP.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anuj Kumar , Mohammed Faisal Sayed , Satyadev Ahlawat , Yamuna Prasad

Topics

Machine Learning > Application Areas > Fairness Natural Language Processing > Understanding > Sentiment Analysis Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Applications > Sentiment Analysis Machine Learning > Learning Types > Multi-Modal Learning

Keywords

sentiment analysis multilingual nlp emotion recognition explainable ai low-resource language emotion classification explainable nlp

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025