Towards Learning to Imitate from a Single Video Demonstration

Glen Berseth; Florian Golemo; Christopher Pal

2023 JMLR JMLR 2023

Towards Learning to Imitate from a Single Video Demonstration

Abstract

Agents that can learn to imitate behaviours observed in video -- without having direct access to internal state or action information of the observed agent -- are more suitable for learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function by comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improve policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and quadruped and humanoid agents in 3D. We show that our method outperforms current state-of-the-art techniques and can learn to imitate behaviours from a single video demonstration. [abs] [ pdf ][ bib ] © JMLR 2023. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning and Robotics

🧭 Keyword Pioneer — siames recurrent neural network

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Glen Berseth , Florian Golemo , Christopher Pal

Topics

Machine Learning > Learning Types > Contrastive Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Robotics > Capabilities > Manipulation

Keywords

contrastive learning reinforcement learning imitation learning policy learning reward function temporal consistency video demonstration siames recurrent neural network

Download PDF

Related papers

Flexible Model Aggregation for Quantile Regression 2023

Efficient Computation of Rankings from Pairwise Comparisons 2023

Efficient Structure-preserving Support Tensor Train Machine 2023

Attacks against Federated Learning Defense Systems and their Mitigation 2023

How Do You Want Your Greedy: Simultaneous or Repeated? 2023