Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Yifan Jia; Yuntao Du; Kailin Jiang; Yuyang Liang; Qihan Ren; Yi Xin; Rui Yang; Fenze Feng; Mingcai Chen; Hengyang Lu; Haozhe Wang; Xiaoye Qu; Qian Li; Dongrui Liu

2026 AAAI AAAI 2026

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Abstract

Abstract Large Multimodal Models (LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation (RAG) frameworks, where the contextual information from external sources may contradict the model’s internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely unaddressed. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities.To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses four types of multimodal knowledge conflicts and includes 1,881 knowledge instances and 3,997 images across 32 broad types, collected through automated pipelines with human verification. We evaluate four representative series of LMMs on both model behavior analysis and conflict detection tasks. Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence. We hope MMKC-Bench will foster further research in multimodal knowledge conflict and enhance the development of multimodal RAG systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yifan Jia , Yuntao Du , Kailin Jiang , Yuyang Liang , Qihan Ren , Yi Xin , Rui Yang , Fenze Feng , Mingcai Chen , Hengyang Lu , Haozhe Wang , Xiaoye Qu , Qian Li , Dongrui Liu

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Fact-Checking

Keywords

multimodal learning large multimodal model retrieval-augmented generation conflict detection parametric knowledge knowledge conflict

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026