Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views

Jingyun Yang; Hsiao-Yu Tung; Yunchu Zhang; Gaurav Pathak; Ashwini Pokle; Katerina Fragkiadaki; Christopher G. Atkeson; Christopher G Atkeson

2021 CORL CoRL 2021

Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views

Abstract

We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — behavior cloning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jingyun Yang , Hsiao-Yu Tung , Yunchu Zhang , Gaurav Pathak , Ashwini Pokle , Christopher G. Atkeson , Christopher G Atkeson , Katerina Fragkiadaki

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Techniques > Pretraining Computer Vision > Analysis > Object Detection

Keywords

self-supervised learning behavior cloning robot manipulation view invariance affordance recognition

Download PDF

Related papers

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding 2021

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo 2021

Taskography: Evaluating robot task planning over large 3D scene graphs 2021

Parallelised Diffeomorphic Sampling-based Motion Planning 2021

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning 2021