2024
NIPS
NeurIPS 2024
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Abstract
We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization and Reinforcement Learning
🧭
Keyword Pioneer
— stochastic gradient bandit
Authors
Topics
Machine Learning > Optimization & Theory > Optimization
Machine Learning > Optimization & Theory > Stochastic Processes
Reinforcement Learning > Methods > Multi-Agent Systems
Mathematics & Optimization > Optimization > Stochastic Methods
Machine Learning > Optimization & Theory > Stochastic Methods
Machine Learning > Learning Types > Multi-Armed Bandits