Stochastic convex optimization with bandit feedback

Alekh Agarwal; Dean P. Foster; Daniel J. Hsu; Sham M. Kakade; Alexander Rakhlin

2011 NIPS NeurIPS 2011

Stochastic convex optimization with bandit feedback

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in X$. We demonstrate a generalization of the ellipsoid algorithm that incurs $O(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — ellipsoid algorithm

🐣 Hot Topic Early Bird — regret bound

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alekh Agarwal , Dean P. Foster , Daniel J. Hsu , Sham M. Kakade , Alexander Rakhlin

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Stochastic Methods Mathematics & Optimization > Optimization > Convex Optimization Machine Learning > Learning Types > Stochastic Methods

Keywords

stochastic optimization online learning convex optimization online optimization stochastic convex optimization bandit feedback ellipsoid algorithm regret bound

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011