Maximin Action Identification: A New Bandit Framework for Games

Aurélien Garivier; Emilie Kaufmann; Wouter M. Koolen

2016 COLT COLT 2016

Maximin Action Identification: A New Bandit Framework for Games

Abstract

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — monte carlo tree search

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Aurélien Garivier , Emilie Kaufmann , Wouter M. Koolen

Topics

Artificial Intelligence > Core AI > Game AI Machine Learning > Optimization & Theory > Online Algorithms

Keywords

monte carlo tree search multi-armed bandit strategic interaction pure exploration confidence bound

Download PDF

Related papers

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies 2016

Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture 2016

Open Problem: Kernel methods on manifolds and metric spaces. What is the probability of a positive definite geodesic exponential kernel? 2016

Learning and Testing Junta Distributions 2016

Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes 2016