2019
CVPR
CVPR 2019
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
Abstract
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Computer Vision and Natural Language Processing
🧭
Keyword Pioneer
— natural language navigation
🐣
Hot Topic Early Bird
— natural language
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Autonomous Vehicles
Artificial Intelligence > Core AI > Multimodal Learning
Artificial Intelligence > Core AI > Planning
Artificial Intelligence > Core AI > Reasoning
Natural Language Processing > Applications > Dialogue Systems
Computer Vision > Core AI > Computer Vision
Artificial Intelligence > Core AI > Language
Computer Vision > Analysis > Computer Vision