Imitating Past Successes can be Very Suboptimal

Benjamin Eysenbach; Soumith Udatha; Ruslan Salakhutdinov; Sergey Levine

2022 NIPS NeurIPS 2022

Imitating Past Successes can be Very Suboptimal

Abstract

Prior work has proposed a simple strategy for reinforcement learning (RL): label experience with the outcomes achieved in that experience, and then imitate the relabeled experience. These outcome-conditioned imitation learning methods are appealing because of their simplicity, strong performance, and close ties with supervised learning. However, it remains unclear how these methods relate to the standard RL objective, reward maximization. In this paper, we prove that existing outcome-conditioned imitation learning methods do not necessarily improve the policy. However, we show that a simple modification results in a method that does guarantee policy improvement. Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — outcome-conditioned imitation learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Benjamin Eysenbach , Soumith Udatha , Ruslan Salakhutdinov , Sergey Levine

Topics

Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Imitation Learning

Keywords

reinforcement learning imitation learning policy improvement reward maximization outcome-conditioned imitation learning

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022