Embodied Amodal Recognition: Learning to Move to Perceive Objects

Jianwei Yang; Zhile Ren; Mingze Xu; Xinlei Chen; David J. Crandall; Devi Parikh; Dhruv Batra

2019 ICCV ICCV 2019

Embodied Amodal Recognition: Learning to Move to Perceive Objects

Abstract

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment and actively control the viewing angle to better understand object shapes and semantics. In this work, we introduce the task of Embodied Amodel Recognition (EAR): an agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation. To address this problem, we develop a new model called Embodied Mask R-CNN for agents to learn to move strategically to improve their visual recognition abilities. We conduct experiments using a simulator for indoor environments. Experimental results show that: 1) agents with embodiment (movement) achieve better visual recognition performance than passive ones and 2) in order to improve visual recognition abilities, agents can learn strategic paths that are different from shortest paths.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — occluded object recognition

🐣 Hot Topic Early Bird — agent system

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jianwei Yang , Zhile Ren , Mingze Xu , Xinlei Chen , David J. Crandall , Devi Parikh , Dhruv Batra

Topics

Artificial Intelligence > Core AI > Agent Systems Computer Vision > Analysis > Object Detection

Keywords

active perception object segmentation agent system occluded object recognition embodied recognition amodal object detection

Download PDF

Related papers

Hierarchical Self-Attention Network for Action Localization in Videos 2019

StructureFlow: Image Inpainting via Structure-Aware Appearance Flow 2019

Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild 2019

Compact Trilinear Interaction for Visual Question Answering 2019

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image 2019