Multi-view Gaze Target Estimation

Qiaomu Miao; Vivek Raju Golani; Jingyi Xu; Progga Paromita Dutta; Minh Hoai; Dimitris Samaras

2025 ICCV ICCV 2025

Multi-view Gaze Target Estimation

Abstract

This paper presents a method that utilizes multiple camera views for the gaze target estimation (GTE) task. The approach integrates information from different camera views to improve accuracy and expand applicability, addressing limitations in existing single-view methods that face challenges such as face occlusion, target ambiguity, and out-of-view targets. Our method processes a pair of camera views as input, incorporating a Head Information Aggregation (HIA) module for leveraging head information from both views for more accurate gaze estimation, an Uncertainty-based Gaze Selection (UGS) for identifying the most reliable gaze output, and an Epipolar-based Scene Attention (ESA) module for cross-view background information sharing. This approach significantly outperforms single-view baselines, especially when the second camera provides a clear view of the person's face. Additionally, our method can estimate the gaze target in the first view using the image of the person in the second view only, a capability not possessed by single-view GTE methods. Furthermore, the paper introduces a multi-view dataset for developing and evaluating multi-view GTE methods. Data and code are available.

🧭 Keyword Pioneer — multi-view gaze estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Qiaomu Miao , Vivek Raju Golani , Jingyi Xu , Progga Paromita Dutta , Minh Hoai , Dimitris Samaras

Topics

Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Human Pose Estimation Computer Vision > Analysis > Object Detection

Keywords

cross-view matching uncertainty estimation head pose estimation multi-view gaze estimation gaze target estimation multi-view dataset

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025