Automatic Error Type Annotation for Arabic

Riadh Belkebir; Nizar Habash

2021 CONLL CoNLL 2021

Automatic Error Type Annotation for Arabic

Abstract

AbstractWe present ARETA, an automatic error type annotation system for Modern Standard Arabic. We design ARETA to address Arabic’s morphological richness and orthographic ambiguity. We base our error taxonomy on the Arabic Learner Corpus (ALC) Error Tagset with some modifications. ARETA achieves a performance of 85.8% (micro average F1 score) on a manually annotated blind test portion of ALC. We also demonstrate ARETA’s usability by applying it to a number of submissions from the QALB 2014 shared task for Arabic grammatical error correction. The resulting analyses give helpful insights on the strengths and weaknesses of different submissions, which is more useful than the opaque M2 scoring metrics used in the shared task. ARETA employs a large Arabic morphological analyzer, but is completely unsupervised otherwise. We make ARETA publicly available.

🧭 Keyword Pioneer — error type annotation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Riadh Belkebir , Nizar Habash

Topics

Natural Language Processing > Applications > Grammatical Error Correction

Keywords

unsupervised learning grammatical error correction morphological analysis arabic language error type annotation

Download PDF

Related papers

BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language 2021

“It’s our fault!”: Insights Into Users’ Understanding and Interaction With an Explanatory Collaborative Dialog System 2021

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering 2021

“It seemed like an annoying woman”: On the Perception and Ethical Considerations of Affective Language in Text-Based Conversational Agents 2021

Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network 2021