Mind the Gap: Quantifying and Aligning Human-AI Visual Attention for Accident Anticipation

Hoe Sung Ryu; Christian Wallraven

2026 AAAI AAAI 2026

Mind the Gap: Quantifying and Aligning Human-AI Visual Attention for Accident Anticipation

Abstract

Abstract Quantifying and understanding human-AI alignment in high-risk tasks such as traffic accident prediction is crucial for deployment of AI systems. Existing alignment studies, however, focus mostly on the static domain and neglect the importance of attentional processing. Here, we present Attention‑DADA, a dataset of accident and non-accident traffic situations that contains detailed human prediction and frame-level eye gaze annotations. Using this benchmark, we evaluate open- and closed-source, state‑of‑the‑art large vision-language-models (VLMs) in terms of their alignment in accident prediction performance and attentional processing in both zero-shot and attention-guided settings. Our results show that human prediction performance and consistency improve as the event time approaches. Similarly, human attentional patterns show dynamic updating throughout event progression. Conversely, while attention guidance improves VLM prediction performance, both performance and attentional alignment stay significantly below human levels as the event approaches, with the performance gap becoming significant 3.5 seconds (s) prior to the event. These results provide the first quantitative evidence of misalignment both in terms of performance and attentional processing during analysis of time-critical, dynamic events, highlighting the need for future improvements in this area.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hoe Sung Ryu , Christian Wallraven

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Artificial Intelligence > Core AI > Interpretability Computer Vision > Domain-Specific > Autonomous Driving

Keywords

eye tracking visual attention vision-language model human-ai alignment accident anticipation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026