2026 EACL EACL 2026

Beyond Understanding: Evaluating the Pragmatic Gap in LLMs’ Cultural Processing of Figurative Language

Abstract

AbstractWe present a comprehensive evaluation of large language models’ (LLMs) ability to process culturally grounded language, specifically to understand and pragmatically use figurative expressions that encode local knowledge and social nuance. Using figurative language as a proxy for cultural nuance and local knowledge, we design evaluation tasks for contextual understanding, pragmatic use, and connotation interpretation across Arabic and English. We evaluate 22 open- and closed-source LLMs on Egyptian Arabic idioms, multidialectal Arabic proverbs, and English proverbs. Results show a consistent hierarchy: accuracy on Arabic proverbs is 4.29% lower than on English proverbs, and performance on Egyptian idioms is 10.28% lower than on Arabic proverbs. On the pragmatic use task, accuracy drops by 14.07% relative to understanding, though providing idioms’ contextual sentences improves accuracy by 10.66%. Models also struggle with connotative meaning, reaching at most 85.58% agreement with human annotators on idioms with full inter-annotator agreement. Figurative language thus serves as an effective diagnostic for cultural reasoning, revealing that while LLMs often interpret figurative meaning, they still face major challenges in using it appropriately. To support future research, we release Kinayat, the first dataset of Egyptian Arabic idioms designed for both figurative understanding and pragmatic use evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio