Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features

Mohamed Lichouri; Mourad Abbas; Khaled Lounnas; Besma Benaziz; Aicha Zitouni

2021 EACL EACL 2021

Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features

Abstract

AbstractIn this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohamed Lichouri , Mourad Abbas , Khaled Lounnas , Besma Benaziz , Aicha Zitouni

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification Interdisciplinary > Linguistics > Computational Linguistics Machine Learning > Core Methods > Support Vector Machine

Keywords

feature extraction text classification arabic dialect support vector machine feature engineering dialect identification tf-idf feature

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021