Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Haonan Chang; Kowndinya Boyalakuntla; Shiyang Lu; Siwei Cai; Eric Pu Jing; Shreesh Keskar; Shijie Geng; Adeeb Abbas; Lifeng Zhou; Kostas Bekris; Abdeslam Boularias

2023 CORL CoRL 2023

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Abstract

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as “pick up a cup on a kitchen table" or “navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments. The code and dataset used for evaluation will be made available upon publication.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Haonan Chang , Kowndinya Boyalakuntla , Shiyang Lu , Siwei Cai , Eric Pu Jing , Shreesh Keskar , Shijie Geng , Adeeb Abbas , Lifeng Zhou , Kostas Bekris , Abdeslam Boularias

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Scene Understanding

Keywords

scene understanding visual grounding object localization scene graph language grounding

Download PDF

Related papers

Stochastic Occupancy Grid Map Prediction in Dynamic Scenes 2023

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning 2023

Robot Parkour Learning 2023

Task-Oriented Koopman-Based Control with Contrastive Encoder 2023

Language-Guided Traffic Simulation via Scene-Level Diffusion 2023