2006
NIPS
NeurIPS 2006
Bayesian Policy Gradient Algorithms
Abstract
Policy gradient methods are reinforcement learning algorithms that adapt a param- eterized policy by following a performance gradient estimate. Conventional pol- icy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.
🚀
Conference Pioneer
— NIPS 2006
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Reinforcement Learning
📈
Trend Setter
— Stochastic Processes
🧭
Keyword Pioneer
— policy gradient
🐣
Hot Topic Early Bird
— gaussian process
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
Authors
Topics
Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Optimization & Theory > Stochastic Processes
Reinforcement Learning > Methods > Policy Learning
Machine Learning > Bayesian & Probabilistic > Bayesian Learning
Machine Learning > Learning Types > Reinforcement Learning
Machine Learning > Bayesian & Probabilistic > Gaussian Processes