Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

Yingkai Li; Yining Wang; Yuan Zhou

2019 COLT COLT 2019

Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

Abstract

We study the linear contextual bandit problem with finite action sets. When the problem dimension is $d$, the time horizon is $T$, and there are $n \leq 2^{d/2}$ candidate actions per time period, we (1) show that the minimax expected regret is $\Omega(\sqrt{dT \log T \log n})$ for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors. Our algorithmic result saves two $\sqrt{\log T}$ factors from previous analysis, and our information-theoretical lower bound also improves previous results by one $\sqrt{\log T}$ factor, revealing a regret scaling quite different from classical multi-armed bandits in which no logarithmic $T$ term is present in minimax regret. Our proof techniques include variable confidence levels and a careful analysis of layer sizes of SupLinUCB on the upper bound side, and delicately constructed adversarial sequences showing the tightness of elliptical potential lemmas on the lower bound side.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — suplinucb algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yingkai Li , Yining Wang , Yuan Zhou

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Learning Theory

Keywords

minimax regret regret bound linear contextual bandit suplinucb algorithm

Download PDF

Related papers

Inference under Information Constraints: Lower Bounds from Chi-Square Contraction 2019

Learning in Non-convex Games with an Optimization Oracle 2019

Learning to Prune: Speeding up Repeated Computations 2019

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise 2019

Learning Two Layer Rectified Neural Networks in Polynomial Time 2019