FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning

Kankan Zhou; Eason Lai; Kyriakos Mouratidis; Jing Jiang

2025 ACL ACL 2025

FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning

Abstract

AbstractHumans possess a remarkable ability to interpret underspecified ambiguous statements by inferring their meanings from contexts such as visual inputs. This ability, however, may not be as developed in recent pre-trained vision-language models (VLMs). In this paper, we introduce a novel probing dataset called FOCUS to evaluate whether state-of-the-art VLMs have this ability. FOCUS consists of underspecified sentences paired with image contexts and carefully designed probing questions. Our experiments reveal that VLMs still fall short in handling underspecification even when visual inputs that can help resolve the ambiguities are available. To further support research in underspecification, FOCUS will be released for public use. We hope this dataset will inspire further research on the reasoning and contextual understanding capabilities of VLMs.

🧭 Keyword Pioneer — underspecification reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision