UMUTeam and SINAI at SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis using Multilingual Large Language Models and Data Augmentation

José Antonio García-Díaz; Ronghao Pan; Salud María Jiménez Zafra; María-Teresa Martn-Valdivia; L. Alfonso Ureña-López; Rafael Valencia-García

2023 SEMEVAL SemEval 2023

UMUTeam and SINAI at SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis using Multilingual Large Language Models and Data Augmentation

Abstract

AbstractThis work presents the participation of the UMUTeam and the SINAI research groups in the SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The goal of this task is to predict the intimacy of a set of tweets in 10 languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean, of which, the last 4 are not in the training data. Our approach to address this task is based on data augmentation and the use of three multilingual Large Language Models (multilingual BERT, XLM and mDeBERTA) by ensemble learning. Our team ranked 30th out of 45 participants. Our best results were achieved with two unseen languages: Korean (16th) and Hindi (19th).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — multilingual large language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

José Antonio García-Díaz , Ronghao Pan , Salud María Jiménez Zafra , María-Teresa Martn-Valdivia , L. Alfonso Ureña-López , Rafael Valencia-García

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Transfer Learning Natural Language Processing > Applications > Sentiment Analysis Machine Learning > Learning Types > Ensemble Learning Deep Learning > Models > Large Language Models

Keywords

ensemble learning sentiment analysis multilingual nlp data augmentation multilingual bert multilingual large language model large language model

Download PDF

Related papers

Coco at SemEval-2023 Task 10: Explainable Detection of Online Sexism 2023

ZBL2W at SemEval-2023 Task 9: A Multilingual Fine-tuning Model with Data Augmentation for Tweet Intimacy Analysis 2023

MLModeler5 at SemEval-2023 Task 3: Detecting the Category and the Framing Techniques in Online News in a Multi-lingual Setup 2023

OPI at SemEval-2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis 2023

NLP-LISAC at SemEval-2023 Task 12: Sentiment Analysis for Tweets expressed in African languages via Transformer-based Models 2023