AbjadAuthorID: Authorship Identification for Arabic-Script Languages at AbjadNLP 2026

Shadi Abudalfa; Saad Ezzini; Ahmed Abdelali; Mustafa Jarrar; Mo El-Haj; Nadir Durrani; Hassan Sajjad; Farah Adeeba; Sina Ahmadi

2026 EACL EACL 2026

AbjadAuthorID: Authorship Identification for Arabic-Script Languages at AbjadNLP 2026

Abstract

AbstractAuthorship identification is a core problem in Natural Language Processing and computational linguistics, with applications spanning digital humanities, literary analysis, and forensic linguistics. While substantial progress has been made for English and other high-resource languages, authorship attribution for languages written in the Arabic (Abjad) script remains underexplored. In this paper, we present an overview of AbjadAuthorID, a shared task organised as part of the AbjadNLP workshop at EACL 2026, which focuses on multiclass authorship identification across Arabic-script languages. The shared task covers Modern Standard Arabic, Urdu, and Kurdish, and is formulated as a closed-set multiclass classification problem over literary text spanning multiple authors and historical periods. We describe the task motivation, dataset construction, evaluation protocol, and participation statistics, and report official results for the Arabic track. The findings highlight both the effectiveness of current approaches in controlled settings and the challenges posed by lower participation and resource availability in some language tracks. AbjadAuthorID establishes a new benchmark for multilingual authorship attribution in morphologically rich, underrepresented languages.

🧭 Keyword Pioneer — arabic-script language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Shadi Abudalfa , Saad Ezzini , Ahmed Abdelali , Mustafa Jarrar , Mo El-Haj , Nadir Durrani , Hassan Sajjad , Farah Adeeba , Sina Ahmadi

Topics

Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

text classification multilingual nlp transformer model authorship identification arabic-script language

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026