Monte-Carlo Tree Search as Regularized Policy Optimization

Jean-Bastien Grill; Florent Altché; Yunhao Tang; Thomas Hubert; Michal Valko; Ioannis Antonoglou; Rémi Munos

2020 ICML ICML 2020

Monte-Carlo Tree Search as Regularized Policy Optimization

Abstract

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero’s search heuristic, along with other common ones, can be interpreted as an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — alpha zero

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Jean-Bastien Grill , Florent Altché , Yunhao Tang , Thomas Hubert , Michal Valko , Ioannis Antonoglou , Rémi Munos

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Game AI

Keywords

deep reinforcement learning policy optimization monte-carlo tree search alpha zero

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020