2024 CVPR CVPR 2024

GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?

Abstract

Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities such as visual inputs. This integration augments the capacity of LLMs for tasks requiring visual comprehension and reasoning. However the extent and limitations of their enhanced abilities are not fully understood especially when it comes to real-world tasks. To address this gap we introduce GlitchBench a novel benchmark derived from video game quality assurance tasks to test and evaluate the reasoning capabilities of LMMs. Our benchmark is curated from a variety of unusual and glitched scenarios from video games and aims to challenge both the visual and linguistic reasoning powers of LMMs in detecting and interpreting out-of-the-ordinary events. We evaluate multiple state-of-the-art LMMs and we show that GlitchBench presents a new challenge for these models. Code and data are available at: https://glitchbench.github.io/

The Questioner
🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer — glitch detection
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio