End-to-End Learning of Semantic Grasping

Eric Jang; Sudheendra Vijayanarasimhan; Peter Pastor; Julian Ibarz; Sergey Levine

2017 CORL CoRL 2017

End-to-End Learning of Semantic Grasping

Abstract

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that learns object detection, classification, and grasp planning in an end-to-end fashion. A “ventral stream” recognizes object class while a “dorsal stream” simultaneously interprets the geometric relationships necessary to execute successful grasps. We leverage the autonomous data collection capabilities of robots to obtain a large self-supervised dataset for training the dorsal stream, and use semi-supervised label propagation to train the ventral stream with only a modest amount of human supervision. We experimentally show that our approach improves upon grasping systems whose components are not learned end-to-end, including a baseline method that uses bounding box detection. Furthermore, we show that jointly training our model with auxiliary data consisting of non-semantic grasping data, as well as semantically labeled images without grasp actions, has the potential to substantially improve semantic grasping performance.

🚀 Conference Pioneer — CORL 2017

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning and Robotics

🧭 Keyword Pioneer — semantic grasping

🐣 Hot Topic Early Bird — semi-supervised learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Eric Jang , Sudheendra Vijayanarasimhan , Peter Pastor , Julian Ibarz , Sergey Levine

Topics

Machine Learning > Learning Types > Self-Supervised Learning Computer Vision > Analysis > Object Detection Robotics > Capabilities > Manipulation Machine Learning > Learning Types > Multi-Task Learning Computer Vision > Core AI > Computer Vision

Keywords

semi-supervised learning grasp planning object detection end-to-end learning semantic grasping

Download PDF

Related papers

CORe50: a New Dataset and Benchmark for Continuous Object Recognition 2017

Active Incremental Learning of Robot Movement Primitives 2017

Efficient Automatic Perception System Parameter Tuning On Site without Expert Supervision 2017

Opportunistic Active Learning for Grounding Natural Language Descriptions 2017

Adaptable Pouring: Teaching Robots Not to Spill using Fast but Approximate Fluid Simulation 2017