2019 ACL ACL 2019

MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

Abstract

AbstractWe present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25% with 1<=N<=3 but showed a much better result with character 4-grams (62.17% accuracy).

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio