CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making

Hasibur Rahman; Hanan Salam

2026 AAAI AAAI 2026

CCD-Bench: Probing Cultural Conflict in Large Language Model Decision-Making

Abstract

Abstract Large language models (LLMs) increasingly shape interpersonal and societal decision-making, yet their ability to navigate explicit conflicts between legitimate cultural values remains underexplored. Existing benchmarks focus on cultural knowledge (CulturalBench), value inference (WorldValuesBench), or single-axis bias (CDEval), but none assess how LLMs adjudicate when multiple cultural frameworks directly clash. We introduce CCD-Bench (Culture-Conflict Decision Benchmark), a benchmark for evaluating LLM decision-making under cross-cultural value conflict. CCD-Bench contains 2,182 open-ended dilemmas across seven domains, each with ten anonymized response options aligned with the ten GLOBE cultural clusters spanning 62 societies. Using a Stratified Latin Square design, we evaluate 17 leading LLMs and find clear biases: models favor Nordic Europe (20.2%) and Germanic Europe (12.4%), while Eastern Europe and Middle East & North Africa responses are least preferred (≈5–6%). Although 87.9% of model rationales reference multiple cultural dimensions, this pluralism is shallow, dominated by Future and Performance Orientation, with limited attention to Assertiveness or Gender Egalitarianism (<3%). Ordering effects are negligible, and model similarity clusters by developer lineage rather than geography. CCD-Bench shifts evaluation from bias detection to pluralistic reasoning, revealing that current LLMs express a Western-centric, consensus-oriented worldview even when confronted with equally valid, culturally diverse alternatives.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hasibur Rahman , Hanan Salam

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness Natural Language Processing > Resources & Methods > Large Language Models

Keywords

benchmark evaluation value alignment cultural bia fairness evaluation large language model

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026