What makes some POMDP problems easy to approximate?

Wee S. Lee; Nan Rong; David Hsu

2007 NIPS NeurIPS 2007

What makes some POMDP problems easy to approximate?

Abstract

Point-based algorithms have been surprisingly successful in computing approx- imately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efﬁciently and thus help to explain the point-based algorithms’ success often ob- served in the experiments. We show that an approximately optimal POMDP so- lution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reach- able space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the com- plexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices.

❓ The Questioner

🌱 Topic Pioneer — Offline RL

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Offline RL

🧭 Keyword Pioneer — point-based algorithms

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — temporal difference learning

Authors

Wee S. Lee , Nan Rong , David Hsu

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Model-Based RL

Keywords

temporal difference learning partially observable markov decision processes partially observable markov decision process belief space approximate planning approximate dynamic programming complexity analysis approximate solution covering number point-based algorithm

Download PDF

Related papers

Exponential Family Predictive Representations of State 2007

Privacy-Preserving Belief Propagation and Sampling 2007

Efficient Principled Learning of Thin Junction Trees 2007

How SVMs can estimate quantiles and the median 2007

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing 2007