PAC Optimal MDP Planning with Application to Invasive Species Management

Majid Alkaee Taleghan; Thomas G. Dietterich; Mark Crowley; Kim Hall; H. Jo Albers

2015 JMLR JMLR 2015

PAC Optimal MDP Planning with Application to Invasive Species Management

Abstract

In a simulator-defined MDP, the Markovian dynamics and rewards are provided in the form of a simulator from which samples can be drawn. This paper studies MDP planning algorithms that attempt to minimize the number of simulator calls before terminating and outputting a policy that is approximately optimal with high probability. The paper introduces two heuristics for efficient exploration and an improved confidence interval that enables earlier termination with probabilistic guarantees. We prove that the heuristics and the confidence interval are sound and produce with high probability an approximately optimal policy in polynomial time. Experiments on two benchmark problems and two instances of an invasive species management problem show that the improved confidence intervals and the new search heuristics yield reductions of between 8% and 47% in the number of simulator calls required to reach near- optimal policies. [abs] [ pdf ][ bib ] © JMLR 2015. (edit, beta)

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — mdp planning

🐣 Hot Topic Early Bird — pac learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Majid Alkaee Taleghan , Thomas G. Dietterich , Mark Crowley , Kim Hall , H. Jo Albers

Topics

Reinforcement Learning > Applications > Value Iteration

Keywords

pac learning markov decision process mdp planning invasive species management

Download PDF

Related papers

The Sample Complexity of Learning Linear Predictors with the Squared Loss 2015

Preface to this Special Issue 2015

Fast Cross-Validation via Sequential Testing 2015

Online Tensor Methods for Learning Latent Variable Models 2015

CEKA: A Tool for Mining the Wisdom of Crowds 2015