DESPOT: Online POMDP Planning with Regularization

Adhiraj Somani; Nan Ye; David Hsu; Wee Sun Lee

2013 NIPS NeurIPS 2013

DESPOT: Online POMDP Planning with Regularization

Abstract

POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online lookahead search algorithm that alleviates these difficulties by limiting the search to a set of sampled scenarios. The execution of all policies on the sampled scenarios is summarized using a Determinized Sparse Partially Observable Tree (DESPOT), which is a sparsely sampled belief tree. Our algorithm, named Regularized DESPOT (R-DESPOT), searches the DESPOT for a policy that optimally balances the size of the policy and the accuracy on its value estimate obtained through sampling. We give an output-sensitive performance bound for all policies derived from the DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime approximation to R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

📈 Trend Setter — Agent Systems

🧭 Keyword Pioneer — online pomdp planning

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Adhiraj Somani , Nan Ye , David Hsu , Wee Sun Lee

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Planning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Value Iteration Knowledge & Reasoning > Reasoning > Automated Planning Machine Learning > Learning Types > Reinforcement Learning

Keywords

policy optimization belief state online planning partially observable markov decision process pomdp planning regularized optimization online pomdp planning partially observable planning sparse belief tree lookahead search belief tree tree search sparse sampling

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013