2021
CVPR
CVPR 2021
Topological Planning With Transformers for Vision-and-Language Navigation
Abstract
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Robotics
🧭
Keyword Pioneer
— navigation planning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Planning
Machine Learning > Application Areas > Efficient Computing
Deep Learning > Architectures > Transformers
Computer Vision > Analysis > Scene Understanding
Robotics > Capabilities > Navigation
Artificial Intelligence > Core AI > Robotics
Deep Learning > Models > Vision-Language Models
Artificial Intelligence > Core AI > Vision-Language Models