Optimistic Bandit Convex Optimization

Scott Yang; Mehryar Mohri

2016 NIPS NeurIPS 2016

Optimistic Bandit Convex Optimization

Abstract

We introduce the general and powerful scheme of predicting information re-use in optimization algorithms. This allows us to devise a computationally efficient algorithm for bandit convex optimization with new state-of-the-art guarantees for both Lipschitz loss functions and loss functions with Lipschitz gradients. This is the first algorithm admitting both a polynomial time complexity and a regret that is polynomial in the dimension of the action space that improves upon the original regret bound for Lipschitz loss functions, achieving a regret of $\widetilde O(T^{11/16}d^{3/8})$. Our algorithm further improves upon the best existing polynomial-in-dimension bound (both computationally and in terms of regret) for loss functions with Lipschitz gradients, achieving a regret of $\widetilde O(T^{8/13} d^{5/3})$.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — lipschitz loss function

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Scott Yang , Mehryar Mohri

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Mathematics & Optimization > Optimization > Convex Optimization

Keywords

stochastic optimization online learning online learning algorithm regret bound bandit convex optimization lipschitz function lipschitz loss function polynomial time complexity

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016