A Fast and Reliable Policy Improvement Algorithm

Yasin Abbasi-Yadkori; Peter L. Bartlett; Stephen J. Wright

2016 AISTATS AISTATS 2016

A Fast and Reliable Policy Improvement Algorithm

Abstract

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Value Iteration

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yasin Abbasi-Yadkori , Peter L. Bartlett , Stephen J. Wright

Topics

Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Reinforcement Learning > Methods > Value Iteration

Keywords

markov decision process variance reduction policy improvement value estimation stochastic policy

Download PDF

Related papers

Bipartite Correlation Clustering: Maximizing Agreements 2016

Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates 2016

On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes 2016

Time-Varying Gaussian Process Bandit Optimization 2016

Bayesian Markov Blanket Estimation 2016