Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content

Savita Bhat; Vasudeva Varma

2023 IJCNLP IJCNLP 2023

Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content

Abstract

AbstractThe process of collecting human-generated annotations is time-consuming and resource-hungry. In the case of low-resource (LR) languages such as Indic languages, these efforts are more expensive due to the dearth of data and human experts. Considering their importance in solving downstream applications, there have been concentrated efforts exploring alternatives for human-generated annotations. To that extent, we seek to evaluate multilingual large language models (LLMs) for their potential to substitute or aid human-generated annotation efforts. We use LLMs to re-label publicly available datasets in LR languages for the tasks of natural language inference, sentiment analysis, and news classification. We compare these annotations with existing ground truth labels to analyze the efficacy of using LLMs for annotation tasks. We observe that the performance of these LLMs varies substantially across different tasks and languages. The results show that off-the-shelf use of multilingual LLMs is not appropriate and results in poor performance in two of the three tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐣 Hot Topic Early Bird — multilingual processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Savita Bhat , Vasudeva Varma

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Applications > Sentiment Analysis Artificial Intelligence > Core AI > Large Language Models

Keywords

sentiment analysis text classification natural language inference multilingual processing low-resource language multilingual language model large language model

Download PDF

Related papers

On the Use of Language Models for Function Identification of Citations in Scholarly Papers 2023

Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation 2023

Automatic Translation of Span-Prediction Datasets 2023

PACT: Pretraining with Adversarial Contrastive Learning for Text Classification 2023

VACASPATI: A Diverse Corpus of Bangla Literature 2023