Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

Fajri Koto; Tilman Beck; Zeerak Talat; Iryna Gurevych; Timothy Baldwin

2024 EACL EACL 2024

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

Abstract

AbstractImproving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT–3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — lexicon-based pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fajri Koto , Tilman Beck , Zeerak Talat , Iryna Gurevych , Timothy Baldwin

Topics

Natural Language Processing > Resources & Methods > Multilingual NLP Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Natural Language Processing > Applications > Sentiment Analysis

Keywords

zero-shot learning sentiment analysis low-resource language multilingual language model lexicon-based pretraining

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024