MovieCORE: COgnitive REasoning in Movies

Gueter Josmy Faure; Min-Hung Chen; Jia-Fong Yeh; Ying Cheng; Hung-Ting Su; Yung-Hao Tang; Shang-Hong Lai; Winston H. Hsu

2025 EMNLP EMNLP 2025

MovieCORE: COgnitive REasoning in Movies

Abstract

AbstractThis paper introduces MovieCORE, a novel video question answering (VQA) dataset designed to probe deeper cognitive understanding of movie content. Unlike existing datasets that focus on surface-level comprehension, MovieCORE emphasizes questions that engage System-2 thinking while remaining specific to the video material. We present an innovative agentic brainstorming approach, utilizing multiple large language models (LLMs) as thought agents to generate and refine high-quality question-answer pairs. To evaluate dataset quality, we develop a set of cognitive tests assessing depth, thought-provocation potential, and syntactic complexity. We also propose a comprehensive evaluation scheme for assessing VQA model performance on deeper cognitive tasks. To address the limitations of existing video-language models (VLMs), we introduce an agentic enhancement module, Agentic Choice Enhancement (ACE), which improves model reasoning capabilities post-training by 25%. Our work contributes to advancing movie understanding in AI systems and provides valuable insights into the capabilities and limitations of current VQA models when faced with more challenging, nuanced questions about cinematic content. Our project page, dataset and code can be found at https://joslefaure.github.io/assets/html/moviecore.html.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — agentic enhancement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gueter Josmy Faure , Min-Hung Chen , Jia-Fong Yeh , Ying Cheng , Hung-Ting Su , Yung-Hao Tang , Shang-Hong Lai , Winston H. Hsu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Multi-Modal Learning Computer Vision > Generation > Visual Question Answering Deep Learning > Learning Types > Chain-of-Thought

Keywords

visual question answering chain-of-thought reasoning video understanding video question answering cognitive reasoning large language model agentic reasoning agentic enhancement

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025