Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Jincheng Mei; Bo Dai; Alekh Agarwal; Sharan Vaswani; Anant Raj; Csaba Szepesvári; Dale Schuurmans

2024 NIPS NeurIPS 2024

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Abstract

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — stochastic gradient bandit

Authors

Jincheng Mei , Bo Dai , Alekh Agarwal , Sharan Vaswani , Anant Raj , Csaba Szepesvári , Dale Schuurmans

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Multi-Agent Systems Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

stochastic gradient descent online learning global convergence exploration-exploitation tradeoff multi-armed bandit learning rate bandit algorithm stochastic gradient bandit constant learning rate

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024