Monte Carlo Tree Search for Policy Optimization

Xiaobai Ma; Katherine Driggs-Campbell; Zongzhang Zhang; Mykel J. Kochenderfer

2019 IJCAI IJCAI 2019

Monte Carlo Tree Search for Policy Optimization

Abstract

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Xiaobai Ma , Katherine Driggs-Campbell , Zongzhang Zhang , Mykel J. Kochenderfer

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning

Keywords

policy optimization monte carlo tree search upper confidence bound gradient-free optimization exploration-exploitation trade-off

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019