R3M: A Universal Visual Representation for Robot Manipulation

Suraj Nair; Aravind Rajeswaran; Vikash Kumar; Chelsea Finn; Abhinav Gupta

2022 CORL CoRL 2022

R3M: A Universal Visual Representation for Robot Manipulation

Abstract

We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a visual representation using the Ego4D human video dataset using a combination of time-contrastive learning, video-language alignment, and an L1 penalty to encourage sparse and compact representations. The resulting representation, R3M, can be used as a frozen perception module for downstream policy learning. Across a suite of 12 simulated robot manipulation tasks, we find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo. Furthermore, R3M enables a Franka Emika Panda arm to learn a range of manipulation tasks in a real, cluttered apartment given just 20 demonstrations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — video-language alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Suraj Nair , Aravind Rajeswaran , Vikash Kumar , Chelsea Finn , Abhinav Gupta

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Applications > Robotics

Keywords

representation learning visual representation robot manipulation video-language alignment time-contrastive learning

Download PDF

Related papers

One-Shot Transfer of Affordance Regions? AffCorrs! 2022

RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments 2022

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning 2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport 2022

Offline Reinforcement Learning for Visual Navigation 2022