Beyond Understanding: Evaluating the Pragmatic Gap in LLMs’ Cultural Processing of Figurative Language

Mena Attia; Aashiq Muhamed; Mai Alkhamissi; Thamar Solorio; Mona T. Diab

2026 EACL EACL 2026

Beyond Understanding: Evaluating the Pragmatic Gap in LLMs’ Cultural Processing of Figurative Language

Abstract

AbstractWe present a comprehensive evaluation of large language models’ (LLMs) ability to process culturally grounded language, specifically to understand and pragmatically use figurative expressions that encode local knowledge and social nuance. Using figurative language as a proxy for cultural nuance and local knowledge, we design evaluation tasks for contextual understanding, pragmatic use, and connotation interpretation across Arabic and English. We evaluate 22 open- and closed-source LLMs on Egyptian Arabic idioms, multidialectal Arabic proverbs, and English proverbs. Results show a consistent hierarchy: accuracy on Arabic proverbs is 4.29% lower than on English proverbs, and performance on Egyptian idioms is 10.28% lower than on Arabic proverbs. On the pragmatic use task, accuracy drops by 14.07% relative to understanding, though providing idioms’ contextual sentences improves accuracy by 10.66%. Models also struggle with connotative meaning, reaching at most 85.58% agreement with human annotators on idioms with full inter-annotator agreement. Figurative language thus serves as an effective diagnostic for cultural reasoning, revealing that while LLMs often interpret figurative meaning, they still face major challenges in using it appropriately. To support future research, we release Kinayat, the first dataset of Egyptian Arabic idioms designed for both figurative understanding and pragmatic use evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mena Attia , Aashiq Muhamed , Mai Alkhamissi , Thamar Solorio , Mona T. Diab

Topics

Artificial Intelligence > Core AI > Foundation Models Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

arabic language cultural reasoning idiom understanding figurative language pragmatic understanding large language model

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026