MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Haochen Xue; Feilong Tang; Ming Hu; Yexin Liu; Qidong Huang; Yulong Li; Chengzhi Liu; Zhongxing Xu; Chong Zhang; Chun-Mei Feng; Yutong Xie; Imran Razzak; Zongyuan Ge; Jionglong Su; Junjun He; Yu Qiao

2025 ACL ACL 2025

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Abstract

AbstractRecent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to “say no.” To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-turn reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Haochen Xue , Feilong Tang , Ming Hu , Yexin Liu , Qidong Huang , Yulong Li , Chengzhi Liu , Zhongxing Xu , Chong Zhang , Chun-Mei Feng , Yutong Xie , Imran Razzak , Zongyuan Ge , Jionglong Su , Junjun He , Yu Qiao

Topics

Artificial Intelligence > Core AI > Memory Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Continual Learning Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Multi-Modal Learning Deep Learning > Models > Vision-Language Models Artificial Intelligence > Core AI > Dialogue Systems

Keywords

benchmark evaluation memory recall information extraction multimodal large language model multi-turn reasoning conversation benchmark real-world conversation information update

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Abstract

Authors

Topics

Keywords

Related papers