Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

Mukul Khanna; Yongsen Mao; Hanxiao Jiang; Sanjay Haresh; Brennan Shacklett; Dhruv Batra; Alexander Clegg; Eric Undersander; Angel X. Chang; Manolis Savva

2024 CVPR CVPR 2024

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

Abstract

We contribute the Habitat Synthetic Scene Dataset a dataset of 211 high-quality 3D scenes and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set of 18656 models of real-world objects. We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects (ObjectGoal navigation). By comparing to synthetic 3D scene datasets from prior work we find that scale helps in generalization but the benefits quickly saturate making visual fidelity and correlation to real-world scenes more important. Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets. Surprisingly we observe that agents trained on just 122 scenes from our dataset outperform agents trained on 10000 scenes from the ProcTHOR-10K dataset in terms of zero-shot generalization in real-world scanned environments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning and Robotics

🧭 Keyword Pioneer — zero shot generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mukul Khanna , Yongsen Mao , Hanxiao Jiang , Sanjay Haresh , Brennan Shacklett , Dhruv Batra , Alexander Clegg , Eric Undersander , Angel X. Chang , Manolis Savva

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Domain Generalization Computer Vision > Analysis > 3D Vision Computer Vision > Domain-Specific > Autonomous Driving Robotics > Capabilities > Navigation Artificial Intelligence > Core AI > Robotics

Keywords

object goal navigation 3d scene understanding zero-shot generalization embodied agent synthetic dataset zero shot generalization

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024