Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability

Prajneya Kumar; Eshika Khandelwal; Makarand Tapaswi; Vishnu Sreekumar

2025 WACV WACV 2025

Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability

Abstract

Understanding what makes a video memorable has important applications in advertising and education technology. Towards this goal we investigate spatio-temporal attention mechanisms underlying video memorability. Different from previous works that fuse multiple features we adopt a simple CNN+Transformer architecture that enables analysis of spatio-temporal attention while matching state-of-the-art (SoTA) performance on video memorability prediction. We compare model attention against human gaze fixations collected through a small-scale eye-tracking study where humans perform the video memory task. We uncover the following insights: (i) Quantitative saliency metrics show that our model trained only to predict a memorability score exhibits similar spatial attention patterns to human gaze especially for more memorable videos. (ii) The model assigns greater importance to initial frames in a video mimicking human attention patterns. (iii) Panoptic segmentation reveals that both (model and humans) assign a greater share of attention to things and less attention to stuff as compared to their occurrence probability.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🧭 Keyword Pioneer — eye-tracking study

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Prajneya Kumar , Eshika Khandelwal , Makarand Tapaswi , Vishnu Sreekumar

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Architectures > Transformers

Keywords

spatio-temporal attention video memorability eye-tracking study human gaze fixation

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025