Arabic dialect identification: An Arabic-BERT model with data augmentation and ensembling strategy

Kamel Gaanoun; Imade Benelallam

2020 COLING COLING 2020

Arabic dialect identification: An Arabic-BERT model with data augmentation and ensembling strategy

Abstract

AbstractThis paper presents the ArabicProcessors team’s deep learning system designed for the NADI 2020 Subtask 1 (country-level dialect identification) and Subtask 2 (province-level dialect identification). We used Arabic-Bert in combination with data augmentation and ensembling methods. Unlabeled data provided by task organizers (10 Million tweets) was split into multiple subparts, to which we applied semi-supervised learning method, and finally ran a specific ensembling process on the resulting models. This system ranked 3rd in Subtask 1 with 23.26% F1-score and 2nd in Subtask 2 with 5.75% F1-score.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kamel Gaanoun , Imade Benelallam

Topics

Machine Learning > Learning Types > Semi-Supervised Learning Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Learning Types > Ensemble Methods Deep Learning > Models > Transformers

Keywords

semi-supervised learning ensemble learning text classification data augmentation arabic dialect identification arabic bert

Download PDF

Related papers

Persuasiveness of News Editorials depending on Ideology and Personality 2020

A Graph Representation of Semi-structured Data for Web Question Answering 2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations 2020

Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism 2020

End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network 2020