Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Fakhraddin Alwajih; El Moatez Billah Nagoudi; Gagan Bhatia; Abdelrahman Mohamed; Muhammad Abdul-Mageed

2024 ACL ACL 2024

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Abstract

AbstractMultimodal large language models (MLLMs) have proven effective in a wide range of tasks that require complex reasoning and linguistic comprehension. However, due to a lack of high-quality multimodal resources in languages other than English, the success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, even those with large speaker populations, such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed *Peacock*, with strong vision and language capabilities. Through comprehensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce *Henna*, a new benchmark specifically designed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally-aware Arabic MLLMs. The GitHub repository for the *Peacock* project is available at [https://github.com/UBC-NLP/peacock](https://github.com/UBC-NLP/peacock).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fakhraddin Alwajih , El Moatez Billah Nagoudi , Gagan Bhatia , Abdelrahman Mohamed , Muhammad Abdul-Mageed

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Resources & Methods > Multilingual NLP Deep Learning > Models > Large Language Models Natural Language Processing > Resources & Methods > Multimodal NLP Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Models > Vision-Language Models

Keywords

multimodal learning visual reasoning benchmark dataset multimodal large language model arabic language dialectal arabic large language model

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024