Continuous Rapid Action Value Estimates

Adrien Couëtoux; Mario Milone; Mátyás Brendel; Hassan Doghmen; Michele Sebag; Olivier Teytaud

2011 ACML ACML 2011

Continuous Rapid Action Value Estimates

Abstract

In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — monte carlo tree search

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

📈 Trend Setter — Reinforcement Learning

Authors

Adrien Couëtoux , Mario Milone , Mátyás Brendel , Hassan Doghmen , Michele Sebag , Olivier Teytaud

Topics

Artificial Intelligence > Core AI > Planning Reinforcement Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Value Iteration Mathematics & Optimization > Optimization > Optimal Control

Keywords

markov decision process exploration exploitation monte carlo tree search upper confidence tree rapid action value estimate

Download PDF

Related papers

Nonlinear Online Classification Algorithm with Probability Margin 2011

Approximate Model Selection for Large Scale LSSVM 2011

Learning Rules from Incomplete Examples via Implicit Mention Models 2011

Estimating Diffusion Probability Changes for AsIC-SIS Model from Information Diffusion Results 2011

Summarization of Yes/No Questions Using a Feature Function Model 2011