Bayes-Adaptive POMDPs

Stephane Ross; Brahim Chaib-draa; Joelle Pineau

2007 NIPS NeurIPS 2007

Bayes-Adaptive POMDPs

Abstract

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforce- ment learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we in- troduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can trade- off between improving the model, identifying the state, and gathering reward. We show how the model can be ﬁnitely approximated while preserving the value func- tion. We describe approximations for belief tracking and planning in this model. Empirical results on two domains show that the model estimate and agent’s return improve over time, as the agent learns better model estimates.

🧭 Keyword Pioneer — partially observable markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Reinforcement Learning

🐣 Hot Topic Early Bird — partially observable markov decision process

Authors

Stephane Ross , Brahim Chaib-draa , Joelle Pineau

Topics

Artificial Intelligence > Core AI > Planning Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning Machine Learning > Learning Types > Bayesian Optimization Artificial Intelligence > Core AI > Decision Making

Keywords

bayesian inference bayesian reinforcement learning exploration-exploitation policy search partially observable markov decision process pomdp planning exploration exploitation belief tracking exploration-exploitation trade-off partially observable mdp bayes-adaptive model

Download PDF

Related papers

Exponential Family Predictive Representations of State 2007

Privacy-Preserving Belief Propagation and Sampling 2007

Efficient Principled Learning of Thin Junction Trees 2007

How SVMs can estimate quantiles and the median 2007

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing 2007