The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks

Anders Giovanni Møller; Arianna Pera; Jacob Dalsgaard; Luca Aiello

2024 EACL EACL 2024

The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks

Abstract

AbstractIn the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such challenges, comparing the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks of varying complexity. Additionally, we examine the impact of training data sizes on performance. Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts. Nevertheless, synthetic augmentation proves beneficial, particularly in improving performance on rare classes within multi-class tasks. Furthermore, we leverage GPT-4 and Llama-2 for zero-shot classification and find that, while they generally display strong performance, they often fall short when compared to specialized classifiers trained on moderately sized training sets.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anders Giovanni Møller , Arianna Pera , Jacob Dalsgaard , Luca Aiello

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Learning Types > Zero-Shot Learning

Keywords

few-shot learning text classification data augmentation synthetic datum zero-shot classification human-labeled datum

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024