Policy Consolidation for Continual Reinforcement Learning

Christos Kaplanis; Murray Shanahan; Claudia Clopath

2019 ICML ICML 2019

Policy Consolidation for Continual Reinforcement Learning

Abstract

We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is agnostic to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries and can adapt in continuously changing environments. In our policy consolidation model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent’s policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — policy consolidation

🐣 Hot Topic Early Bird — continual learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Christos Kaplanis , Murray Shanahan , Claudia Clopath

Topics

Machine Learning > Learning Types > Continual Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

deep reinforcement learning continual learning catastrophic forgetting continuous control policy consolidation hidden network

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019