Particle Filter-based Policy Gradient in POMDPs

Pierre-arnaud Coquelin; Romain Deguest; Rémi Munos

2008 NIPS NeurIPS 2008

Particle Filter-based Policy Gradient in POMDPs

Abstract

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

📈 Trend Setter — Stochastic Methods

🧭 Keyword Pioneer — belief state estimation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌱 Topic Pioneer — Exploration

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Pierre-arnaud Coquelin , Romain Deguest , Rémi Munos

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Planning Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Exploration

Keywords

reinforcement learning policy gradient belief state partially observable markov decision process particle filter belief state estimation finite difference variance reduction partially observable decision process finite difference method

Download PDF

Related papers

On the Efficient Minimization of Classification Calibrated Surrogates 2008

Hebbian Learning of Bayes Optimal Decisions 2008

Biasing Approximate Dynamic Programming with a Lower Discount Factor 2008

Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation 2008

Domain Adaptation with Multiple Sources 2008