Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar; Vaibhav Mathur; Denis Yarats; Lerrel Pinto

2022 CORL CoRL 2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Abstract

Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐣 Hot Topic Early Bird — behavior cloning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siddhant Haldar , Vaibhav Mathur , Denis Yarats , Lerrel Pinto

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Artificial Intelligence > Learning Paradigms > Transfer Learning Reinforcement Learning > Applications > Robotics

Keywords

imitation learning optimal transport inverse reinforcement learning behavior cloning robot manipulation trajectory matching

Download PDF

Related papers

One-Shot Transfer of Affordance Regions? AffCorrs! 2022

RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments 2022

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning 2022

Offline Reinforcement Learning for Visual Navigation 2022

Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes 2022