Asymptotically Optimal Information-Directed Sampling

Johannes Kirschner; Tor Lattimore; Claire Vernade; Csaba Szepesvári

2021 COLT COLT 2021

Asymptotically Optimal Information-Directed Sampling

Abstract

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization

Keywords

sampling strategy regret bound information gain stochastic bandit optimal algorithm

Download PDF

Related papers

SGD Generalizes Better Than GD (And Regularization Doesn’t Help) 2021

Learning in Matrix Games can be Arbitrarily Complex 2021

Reconstructing weighted voting schemes from partial information about their power indices 2021

Online Learning from Optimal Actions 2021

Robust learning under clean-label attack 2021