Online Learning from Optimal Actions

Omar Besbes; Yuri Fonseca; Ilan Lobel

2021 COLT COLT 2021

Online Learning from Optimal Actions

Abstract

We study the problem of online contextual optimization where, at each period, instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. At each period, the decision-maker has access to a new set of feasible actions to select from and to a new contextual function that affects that period’s loss function. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. We obtain the first regret bound for this problem that is logarithmic in the time horizon. Our results are derived through the development and analysis of a novel algorithmic structure that leverages the underlying geometry of the problem.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Omar Besbes , Yuri Fonseca , Ilan Lobel

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms

Keywords

online learning logarithmic regret regret bound contextual optimization

Download PDF

Related papers

SGD Generalizes Better Than GD (And Regularization Doesn’t Help) 2021

Learning in Matrix Games can be Arbitrarily Complex 2021

Reconstructing weighted voting schemes from partial information about their power indices 2021

Robust learning under clean-label attack 2021

Statistical Query Algorithms and Low Degree Tests Are Almost Equivalent 2021