2025 NAACL NAACL 2025

Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions

Abstract

AbstractThis study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio