2025 WACV WACV 2025

SANPO: A Scene Understanding Accessibility and Human Navigation Dataset

Abstract

Vision is essential for human navigation. The World Health Organization (WHO) estimates that 43.3 million people were blind in 2020 and this number is projected to reach 61 million by 2050. Modern scene understanding models could empower these people by assisting them with navigation obstacle avoidance and visual recognition capabilities. The research community needs high quality datasets for both training and evaluation to build these systems. And while datasets for autonomous vehicles are abundant there is a critical gap in datasets tailored for outdoor human navigation. This gap poses a major obstacle to the development of computer vision based Assistive Technologies. To overcome this obstacle we present SANPO a large-scale egocentric video dataset designed for dense prediction in outdoor human navigation environments. SANPO contains 701 stereo videos of 30+ seconds captured in diverse real-world outdoor environments across four geographic locations in the USA. Every frame has a high resolution depth map and 112K frames were annotated with temporally consistent dense video panoptic segmentation labels. The dataset also includes 1961 high-quality synthetic videos with pixel accurate depth and panoptic segmentation annotations to balance the noisy real world annotations with the high precision synthetic annotations. SANPO is already publicly available and is being used by applications like Project Guideline to train mobile models that help low-vision users run independently. To preserve anonymization during peer review a link to the dataset will be provided upon acceptance.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision
🧭 Keyword Pioneer — human navigation
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio