Coherent Inference on Optimal Play in Game Trees

Philipp Hennig; David Stern; Thore Graepel

2010 AISTATS AISTATS 2010

Coherent Inference on Optimal Play in Game Trees

Abstract

Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

🚀 Conference Pioneer — AISTATS 2010

📈 Trend Setter — Game AI

🧭 Keyword Pioneer — game tree

🐣 Hot Topic Early Bird — generative model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

Authors

Philipp Hennig , David Stern , Thore Graepel

Topics

Artificial Intelligence > Core AI > Game AI Artificial Intelligence > Bayesian & Probabilistic > Probabilistic Modeling Machine Learning > Bayesian & Probabilistic > Bayesian Inference Deep Learning > Learning Types > Reinforcement Learning

Keywords

message passing value iteration expectation propagation optimal policy generative model game tree off-policy value and/or tree off-policy inference

Download PDF

Related papers

Towards Understanding Situated Natural Language 2010

Mass Fatality Incident Identification based on nuclear DNA evidence 2010

Locally Linear Denoising on Image Manifolds 2010

Negative Results for Active Learning with Convex Losses 2010

Collaborative Filtering on a Budget 2010