Uslub at AbjadAuthorID Shared Task: A Comparative Analysis of Traditional Machine Learning and Transformer-Based Models for Authorship Attribution in Arabic and Urdu

Shahad Alsuhaibani; Mohamed Alkaoud

2026 EACL EACL 2026

Uslub at AbjadAuthorID Shared Task: A Comparative Analysis of Traditional Machine Learning and Transformer-Based Models for Authorship Attribution in Arabic and Urdu

Abstract

AbstractAuthorship attribution is a critical task in natural language processing with applications ranging from forensic linguistics to plagiarism detection. While well-studied in high-resource languages, it remains challenging for low-resource languages like Arabic and Urdu. In this paper, we present our participation in the AbjadNLP shared task, where we systematically evaluate three distinct approaches: traditional machine learning using SVM with TF-IDF features, fine-tuned transformer-based models (AraBERT), and LLMs. We demonstrate that while fine-tuned AraBERT excels in Arabic, traditional lexical models (SVM) prove more robust for Urdu, outperforming both BERT-based and LLM approaches. We also show that few-shot prompting with LLMs, when operated as a reranker over top candidates, significantly outperforms zero-shot baselines. Our final systems achieved competitive performance, ranking 6th and 1st in the Arabic and Urdu tasks respectively.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shahad Alsuhaibani , Mohamed Alkaoud

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification

Keywords

few-shot learning text classification authorship attribution support vector machine large language model tf-idf feature

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026