Visual Navigation With Spatial Attention

Bar Mayo; Tamir Hazan; Ayellet Tal

2021 CVPR CVPR 2021

Visual Navigation With Spatial Attention

Abstract

This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to learn the agent's policy using a reinforcement learning algorithm. Our key contribution is a novel attention probability model for visual navigation tasks. This attention encodes semantic information about observed objects, as well as spatial information about their place. This combination of the "what"" and the "where"" allows the agent to navigate toward the sought-after object effectively. The attention model is shown to improve the agent's policy and to achieve state-of-the-art results on commonly-used datasets.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Reinforcement Learning and Robotics

🧭 Keyword Pioneer — attention probability model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bar Mayo , Tamir Hazan , Ayellet Tal

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Robotics > Capabilities > Navigation Artificial Intelligence > Core AI > Robotics Artificial Intelligence > Core AI > Reinforcement Learning Deep Learning > Techniques > Attention

Keywords

reinforcement learning attention mechanism object goal navigation visual navigation spatial attention reinforcement learning algorithm attention probability model

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021