AlGhafa Evaluation Benchmark for Arabic Language Models

Ebtesam Almazrouei; Ruxandra Cojocaru; Michele Baldo; Quentin Malartic; Hamza Alobeidli; Daniele Mazzotta; Guilherme Penedo; Giulia Campesan; Mugariya Farooq; Maitha Alhammadi; Julien Launay; Badreddine Noune

2023 EMNLP EMNLP 2023

AlGhafa Evaluation Benchmark for Arabic Language Models

Abstract

AbstractRecent advances in the space of Arabic large language models have opened up a wealth of potential practical applications. From optimal training strategies, large scale data acquisition and continuously increasing NLP resources, the Arabic LLM landscape has improved in a very short span of time, despite being plagued by training data scarcity and limited evaluation resources compared to English. In line with contributing towards this ever-growing field, we introduce AlGhafa, a new multiple-choice evaluation benchmark for Arabic LLMs. For showcasing purposes, we train a new suite of models, including a 14 billion parameter model, the largest monolingual Arabic decoder-only model to date. We use a collection of publicly available datasets, as well as a newly introduced HandMade dataset consisting of 8 billion tokens. Finally, we explore the quantitative and qualitative toxicity of several Arabic models, comparing our models to existing public Arabic LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ebtesam Almazrouei , Ruxandra Cojocaru , Michele Baldo , Quentin Malartic , Hamza Alobeidli , Daniele Mazzotta , Guilherme Penedo , Giulia Campesan , Mugariya Farooq , Maitha Alhammadi , Julien Launay , Badreddine Noune

Topics

Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models

Keywords

benchmark evaluation text classification natural language inference question answering model evaluation named entity recognition toxicity detection multiple choice arabic language large language model

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023