Dolomites@#SMM4H 2024: Helping LLMs “Know The Drill” in Low-Resource Settings - A Study on Social Media Posts

Giuliano Tortoreto; Seyed Mahed Mousavi

2024 ACL ACL 2024

Dolomites@#SMM4H 2024: Helping LLMs “Know The Drill” in Low-Resource Settings - A Study on Social Media Posts

Abstract

AbstractThe amount of data to fine-tune LLMs plays a crucial role in the performance of these models in downstream tasks. Consequently, it is not straightforward to deploy these models in low-resource settings. In this work, we investigate two new multi-task learning data augmentation approaches for fine-tuning LLMs when little data is available: “In-domain Augmentation” of the training data and extracting “Drills” as smaller tasks from the target dataset. We evaluate the proposed approaches in three natural language processing settings in the context of SMM4H 2024 competition tasks: multi-class classification, entity recognition, and information extraction. The results show that both techniques improve the performance of the models in all three settings, suggesting a positive impact from the knowledge learned in multi-task training to perform the target task.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Giuliano Tortoreto , Seyed Mahed Mousavi

Topics

Natural Language Processing > Applications > Text Classification Machine Learning > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Learning Types > Data Augmentation

Keywords

multi-task learning transfer learning data augmentation low-resource setting large language model

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024