Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Julian Zimmert; Tor Lattimore

2019 NIPS NeurIPS 2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Abstract

The information-theoretic analysis by Russo and Van Roy [2014] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications are derived from existing techniques from online convex optimisation. Besides this, we improve best known regret guarantees for $k$-armed adversarial bandits, online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — information ratio

🐣 Hot Topic Early Bird — online convex optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Julian Zimmert , Tor Lattimore

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Mathematics & Optimization > Optimization > Optimization Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

information theory online learning online convex optimization mirror descent thompson sampling multi-armed bandit regret bound bandit algorithm adversarial bandit information ratio

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019