Linear Bandits on Uniformly Convex Sets

Thomas Kerdreux; Christophe Roux; Alexandre D'aspremont; Sebastian Pokutta

2021 JMLR JMLR 2021

Linear Bandits on Uniformly Convex Sets

Abstract

Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$. This result provides new elements for the open question in Bubeck and Cesa-Bianchi, 2012. When the action set is $q$-uniformly convex but not necessarily strongly convex ($q >2$), we obtain pseudo-regret bounds $\tilde{\mathcal{O}}(n^{1/q}T^{1/p})$ with $p$ s.t. $1/p + 1/q=1$. These pseudo-regret bounds are competitive with the general $\tilde{\mathcal{O}}(n\sqrt{T})$ for a time horizon range that depends on the degree $q>2$ of the set's uniform convexity and the dimension $n$ of the problem. [abs] [ pdf ][ bib ] © JMLR 2021. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — uniform convexity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Thomas Kerdreux , Christophe Roux , Alexandre D'aspremont , Sebastian Pokutta

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Mathematics > Geometry Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

stochastic optimization online learning convex optimization linear bandit uniform convexity pseudo-regret bound action set uniformly convex set

Download PDF

Related papers

Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression 2021

Normalizing Flows for Probabilistic Modeling and Inference 2021

Determining the Number of Communities in Degree-corrected Stochastic Block Models 2021

Guided Visual Exploration of Relations in Data Sets 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach 2021