2009 NIPS NeurIPS 2009

The Infinite Partially Observable Markov Decision Process

Abstract

The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains that require balancing actions that increase an agents knowledge and actions that increase an agents reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many realworld problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and explicitly models only visited states. We demonstrate the iPOMDP utility on several standard problems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning
📈 Trend Setter — Offline RL
🧭 Keyword Pioneer — infinite state space
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics
🐣 Hot Topic Early Bird — sequential decision making