Variational methods for Reinforcement Learning

Thomas Furmston; David Barber

2010 AISTATS AISTATS 2010

Variational methods for Reinforcement Learning

Abstract

We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.

🚀 Conference Pioneer — AISTATS 2010

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thomas Furmston , David Barber

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Deep RL Machine Learning > Bayesian & Probabilistic > Variational Inference

Keywords

reinforcement learning variational inference bayesian learning markov decision process expectation propagation variational baye

Download PDF

Related papers

Towards Understanding Situated Natural Language 2010

Mass Fatality Incident Identification based on nuclear DNA evidence 2010

Locally Linear Denoising on Image Manifolds 2010

Negative Results for Active Learning with Convex Losses 2010

Collaborative Filtering on a Budget 2010