HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models

Trung Kiet Huynh; Duy Minh Dao Sy; Nguyen Chi Tran; Pham Phu Hoa; Nguyen Lam Phu Quy; Truong Bao Tran

2026 EACL EACL 2026

HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models

Abstract

AbstractWe present our approach to the AbjadNLP 2026 Arabic Authorship Identification shared task, achieving 4th place. Our key finding is that AraBERT-base (110M) outperforms AraBERT-large (340M) on the test set with macro F1 of 0.8449 versus 0.8096, despite lower validation scores. We handle long passages via sliding window chunking with mean pooling, and use a two-stage classification head with dual dropout for regularization. Per-class analysis reveals that translated works achieve perfect F1 while classical poets remain challenging due to shared formal structures. Our results challenge the "scale is all you need" assumption for stylometric tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Trung Kiet Huynh , Duy Minh Dao Sy , Nguyen Chi Tran , Pham Phu Hoa , Nguyen Lam Phu Quy , Truong Bao Tran

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification

Keywords

text classification arabic language model authorship identification

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026