2006 NIPS NeurIPS 2006

Bayesian Policy Gradient Algorithms

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a param- eterized policy by following a performance gradient estimate. Conventional pol- icy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

🚀 Conference Pioneer — NIPS 2006
🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning
📈 Trend Setter — Stochastic Processes
🧭 Keyword Pioneer — policy gradient
🐣 Hot Topic Early Bird — gaussian process
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio