Off-Policy Actor-Critic with Shared Experience Replay

Simon Schmitt; Matteo Hessel; Karen Simonyan

2020 ICML ICML 2020

Off-Policy Actor-Critic with Shared Experience Replay

Abstract

We investigate the combination of actor-critic reinforcement learning algorithms with a uniform large-scale experience replay and propose solutions for two ensuing challenges: (a) efficient actor-critic learning with experience replay (b) the stability of off-policy learning where agents learn from other agents behaviour. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solutions on DMLab-30 and further show the benefits of this setup in two training regimes for Atari: (1) a single agent is trained up until 200M environment frames per game (2) a population of agents is trained up until 200M environment frames each and may share experience. We demonstrate state-of-the-art data efficiency among model-free agents in both regimes.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

Authors

Simon Schmitt , Matteo Hessel , Karen Simonyan

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Game AI Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning importance sampling off-policy learning trust region experience replay

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020