SANPO: A Scene Understanding Accessibility and Human Navigation Dataset

Sagar M. Waghmare; Kimberly Wilber; Dave Hawkey; Xuan Yang; Matthew Wilson; Stephanie Debats; Cattalyya Nuengsigkapian; Astuti Sharma; Lars Pandikow; Huisheng Wang; Hartwig Adam; Mikhail Sirotenko

2025 WACV WACV 2025

SANPO: A Scene Understanding Accessibility and Human Navigation Dataset

Abstract

Vision is essential for human navigation. The World Health Organization (WHO) estimates that 43.3 million people were blind in 2020 and this number is projected to reach 61 million by 2050. Modern scene understanding models could empower these people by assisting them with navigation obstacle avoidance and visual recognition capabilities. The research community needs high quality datasets for both training and evaluation to build these systems. And while datasets for autonomous vehicles are abundant there is a critical gap in datasets tailored for outdoor human navigation. This gap poses a major obstacle to the development of computer vision based Assistive Technologies. To overcome this obstacle we present SANPO a large-scale egocentric video dataset designed for dense prediction in outdoor human navigation environments. SANPO contains 701 stereo videos of 30+ seconds captured in diverse real-world outdoor environments across four geographic locations in the USA. Every frame has a high resolution depth map and 112K frames were annotated with temporally consistent dense video panoptic segmentation labels. The dataset also includes 1961 high-quality synthetic videos with pixel accurate depth and panoptic segmentation annotations to balance the noisy real world annotations with the high precision synthetic annotations. SANPO is already publicly available and is being used by applications like Project Guideline to train mobile models that help low-vision users run independently. To preserve anonymization during peer review a link to the dataset will be provided upon acceptance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — human navigation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sagar M. Waghmare , Kimberly Wilber , Dave Hawkey , Xuan Yang , Matthew Wilson , Stephanie Debats , Cattalyya Nuengsigkapian , Astuti Sharma , Lars Pandikow , Huisheng Wang , Hartwig Adam , Mikhail Sirotenko

Topics

Computer Vision > Analysis > Depth Estimation Computer Vision > Analysis > Scene Understanding Computer Vision > Domain-Specific > Egocentric Vision Computer Vision > Processing > Semantic Segmentation Artificial Intelligence > Core AI > Computer Vision

Keywords

scene understanding depth estimation egocentric vision dense prediction egocentric video panoptic segmentation assistive technology human navigation

Download PDF

Related papers

Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting 2025

Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation 2025

Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach 2025

Disentangling Spatio-Temporal Knowledge for Weakly Supervised Object Detection and Segmentation in Surgical Video 2025