Exploiting Best-Match Equations for Efficient Reinforcement Learning

Harm Van Seijen; Shimon Whiteson; Hado van Hasselt; Marco Wiering

2011 JMLR JMLR 2011

Exploiting Best-Match Equations for Efficient Reinforcement Learning

Abstract

This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations, which combine a sparse model with a model-free Q-value function constructed from samples not used by the model. We prove that, unlike regular sparse model-based methods, best-match learning is guaranteed to converge to the optimal Q-values in the tabular case. Empirical results demonstrate that best-match learning can substantially outperform regular sparse model-based methods, as well as several model-free methods that strive to improve the sample efficiency of temporal-difference methods. In addition, we demonstrate that best-match learning can be successfully combined with function approximation. [abs] [ pdf ][ bib ] © JMLR 2011. (edit, beta)

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — q-value function

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Harm Van Seijen , Shimon Whiteson , Hado van Hasselt , Marco Wiering

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Value Iteration

Keywords

reinforcement learning function approximation model-based learning temporal-difference learning q-value function model-free learning

Download PDF

Related papers

MSVMpack: A Multi-Class Support Vector Machine Package 2011

Multitask Sparsity via Maximum Entropy Discrimination 2011

Training SVMs Without Offset 2011

Logistic Stick-Breaking Process 2011

Learning Multi-modal Similarity 2011