Bayesian Policy Gradient Algorithms

Mohammad Ghavamzadeh; Yaakov Engel

2006 NIPS NeurIPS 2006

Bayesian Policy Gradient Algorithms

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a param- eterized policy by following a performance gradient estimate. Conventional pol- icy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

🚀 Conference Pioneer — NIPS 2006

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Stochastic Processes

🧭 Keyword Pioneer — policy gradient

🐣 Hot Topic Early Bird — gaussian process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Mohammad Ghavamzadeh , Yaakov Engel

Topics

Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Policy Learning Machine Learning > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Bayesian & Probabilistic > Gaussian Processes

Keywords

reinforcement learning bayesian inference policy gradient bayesian reinforcement learning uncertainty quantification gaussian process natural gradient monte carlo estimation variance reduction actor-critic method

Download PDF

Related papers

Temporal Coding using the Response Properties of Spiking Neurons 2006

Parameter Expanded Variational Bayesian Methods 2006

Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning 2006

Ordinal Regression by Extended Binary Classification 2006

Blind source separation for over-determined delayed mixtures 2006