Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Gabor Bartok; Dávid Pál; Csaba Szepesvári

2011 COLT COLT 2011

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

Abstract

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, $\widetilde{\Theta}(\sqrt{T}), \Theta(T^{2/3})$, or $\Theta(T)$. We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.

🚀 Conference Pioneer — COLT 2011

🧭 Keyword Pioneer — partial monitoring

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Game Theory

Authors

Gabor Bartok , Dávid Pál , Csaba Szepesvári

Topics

Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Optimization & Theory > Online Algorithms Mathematics & Optimization > Optimization > Game Theory Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

regret analysis online learning sequential decision minimax regret learning algorithm partial monitoring stochastic environment

Download PDF

Related papers

Competitive Closeness Testing 2011

Bandits, Query Learning, and the Haystack Dimension 2011

Minimax Policies for Combinatorial Prediction Games 2011

Sample Complexity Bounds for Differentially Private Learning 2011

Multiclass Learnability and the ERM principle 2011