Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Karel Horák; Branislav Bosansky; Krishnendu Chatterjee

2018 IJCAI IJCAI 2018

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

Abstract

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

🧭 Keyword Pioneer — goal pomdp

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐣 Hot Topic Early Bird — partially observable markov decision process