Single-Agent Policy Tree Search With Guarantees

Laurent Orseau; Levi Lelis; Tor Lattimore; Theophane Weber

2018 NIPS NeurIPS 2018

Single-Agent Policy Tree Search With Guarantees

Abstract

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for ``needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — policy guidance

🐣 Hot Topic Early Bird — neural network training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Laurent Orseau , Levi Lelis , Tor Lattimore , Theophane Weber

Topics

Artificial Intelligence > Core AI > Planning Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reasoning

Keywords

neural network training path planning policy gradient heuristic search best-first search game playing monte-carlo tree search tree search neural network policy guidance goal state policy tree search

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018