Waypoint Models for Instruction-Guided Navigation in Continuous Environments

Jacob Krantz; Aaron Gokaslan; Dhruv Batra; Stefan Lee; Oleksandr Maksymets

2021 ICCV ICCV 2021

Waypoint Models for Instruction-Guided Navigation in Continuous Environments

Abstract

Little inquiry has explicitly addressed the role of action spaces in language-guided visual navigation -- either in terms of its effect on navigation success or the efficiency with which a robotic agent could execute the resulting trajectory. Building on the recently released VLN-CE setting for instruction following in continuous environments, we develop a class of language-conditioned waypoint prediction networks to examine this question. We vary the expressivity of these models to explore a spectrum between low-level actions and continuous waypoint prediction. We measure task performance and estimated execution time on a profiled LoCoBot robot. We find more expressive models result in simpler, faster to execute trajectories, but lower-level actions can achieve better navigation metrics by approximating shortest paths better. Further, our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard -- increasing success rate by 4% with our best model on this challenging task.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — instruction following

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jacob Krantz , Aaron Gokaslan , Dhruv Batra , Stefan Lee , Oleksandr Maksymets

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Artificial Intelligence > Core AI > Planning Machine Learning > Learning Types > Reinforcement Learning

Keywords

visual navigation instruction following continuous environment waypoint prediction language-guided navigation

Download PDF

Related papers

Spatial-Temporal Transformer for Dynamic Scene Graph Generation 2021

ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators 2021

A Broad Study on the Transferability of Visual Representations With Contrastive Learning 2021

Query Adaptive Few-Shot Object Detection With Heterogeneous Graph Convolutional Networks 2021

Self-Supervised Neural Networks for Spectral Snapshot Compressive Imaging 2021