Regularized Policy Iteration

Amir M. Farahmand; Mohammad Ghavamzadeh; Shie Mannor; Csaba Szepesvári

2008 NIPS NeurIPS 2008

Regularized Policy Iteration

Abstract

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — bellman residual minimization

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

📈 Trend Setter — Reinforcement Learning

Authors

Amir M. Farahmand , Mohammad Ghavamzadeh , Shie Mannor , Csaba Szepesvári

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Stochastic Methods Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning temporal difference learning regularization function approximation reproducing kernel hilbert space policy iteration bellman residual minimization l2 regularization bellman residual least-squares temporal difference kernel methods least-squares temporal difference learning

Download PDF

Related papers

On the Efficient Minimization of Classification Calibrated Surrogates 2008

Hebbian Learning of Bayes Optimal Decisions 2008

Biasing Approximate Dynamic Programming with a Lower Discount Factor 2008

Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation 2008

Domain Adaptation with Multiple Sources 2008