2023
EMNLP
EMNLP 2023
UoT at NADI 2023 shared task: Automatic Arabic Dialect Identification is Made Possible
Abstract
AbstractIn this paper we present our approach towards Arabic Dialect identification which was part of the Fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). We tested several techniques to identify Arabic dialects. We obtained the best result by fine-tuning the pre-trained MARBERTv2 model with a modified training dataset. The training set was expanded by sorting tweets based on dialects, concatenating every two adjacent tweets, and adding them to the original dataset as new tweets. We achieved 82.87 on F1 score and we were at the seventh position among 16 participants.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— tweet concatenation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Core Methods > Classification
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Types > Transfer Learning
Artificial Intelligence > Core AI > Natural Language Processing
Deep Learning > Learning Types > Fine-Tuning