TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

Howard Chen; Alane Suhr; Dipendra Misra; Noah Snavely; Yoav Artzi

2019 CVPR CVPR 2019

TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

Abstract

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Natural Language Processing

🧭 Keyword Pioneer — natural language navigation

🐣 Hot Topic Early Bird — natural language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Howard Chen , Alane Suhr , Dipendra Misra , Noah Snavely , Yoav Artzi

Topics

Artificial Intelligence > Core AI > Autonomous Vehicles Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Planning Artificial Intelligence > Core AI > Reasoning Natural Language Processing > Applications > Dialogue Systems Computer Vision > Core AI > Computer Vision Artificial Intelligence > Core AI > Language Computer Vision > Analysis > Computer Vision

Keywords

multi-modal learning visual reasoning visual navigation language grounding natural language spatial reasoning street view natural language navigation visual street environment

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019