Policy and Value Transfer in Lifelong Reinforcement Learning

David Abel; Yuu Jinnai; Sophie Yue Guo; George Konidaris; Michael Littman

2018 ICML ICML 2018

Policy and Value Transfer in Lifelong Reinforcement Learning

Abstract

We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MaxQInit, a practical new method for value-function-based transfer. We show that MaxQInit performs well in simple lifelong RL experiments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — bootstrap learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

📈 Trend Setter — Continual Learning

🐣 Hot Topic Early Bird — pac learning

Authors

David Abel , Yuu Jinnai , Sophie Yue Guo , George Konidaris , Michael Littman

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Continual Learning Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Paradigms > Continual Learning Reinforcement Learning > Methods > Value Iteration

Keywords

policy optimization transfer learning pac learning value function lifelong learning task distribution lifelong reinforcement learning policy transfer bootstrap learning value function initialization

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018