Exploration by Optimisation in Partial Monitoring

Tor Lattimore; Csaba Szepesvári

2020 COLT COLT 2020

Exploration by Optimisation in Partial Monitoring

Abstract

We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 6m k^(3/2) sqrt(n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tor Lattimore , Csaba Szepesvári

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms

Keywords

minimax regret online algorithm partial monitoring adversarial bandit

Download PDF

Related papers

Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection 2020

Highly smooth minimization of non-smooth problems 2020

Closure Properties for Private Classification and Online Prediction 2020

Efficient, Noise-Tolerant, and Private Learning via Boosting 2020

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit 2020