CHROMIC: Chronological Reasoning Across Multi-Panel Comics

Bingxuan Hou; Jiayi Lin; Chenyang Zhang; Dapeng Yin; Shuyue Zhu; Qingqing Hong; Mengna Gao; Junli Wang

2026 EACL EACL 2026

CHROMIC: Chronological Reasoning Across Multi-Panel Comics

Abstract

AbstractLarge-scale vision–language models (LVLMs) have achieved remarkable progress on various reasoning tasks. However, most studies focus on natural photographic images and pay limited attention to multi-panel visual narratives such as comics. This leaves a clear gap in our understanding of how well LVLMs perform chronological reasoning across comic panels. To address this, we introduce **ChrOMIC**, a new benchmark dataset for **chro**nological reasoning in multi-panel **comic**s. It covers six types of reasoning questions and spans both Western and Japanese comic styles. To ensure high-quality annotations, we customized a human–AI collaborative annotation process tailored to the characteristics of the two comic styles. We further introduce three core tasks: Description Reordering and Panel Reordering, which jointly assess models’ ability to understand chronological order in panel sequences, and Multiple-Choice Question Answering (MCQA), which evaluates narrative-level reasoning. We evaluate a range of open-source and commercial LVLMs on ChrOMIC, and find that even the leading models struggle with panel-based chronological reasoning. Further analysis reveals key limitations, including weak visual action understanding and frequent hallucinations in fine-grained visual interpretation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — large-scale vision-language model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Bingxuan Hou , Jiayi Lin , Chenyang Zhang , Dapeng Yin , Shuyue Zhu , Qingqing Hong , Mengna Gao , Junli Wang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Question Answering

Keywords

visual question answering benchmark dataset large-scale vision-language model chronological reasoning multi-panel comics narrative-level reasoning

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026