Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Nicolò Cesa-bianchi; Pierre Gaillard; Claudio Gentile; Sébastien Gerchinovitz

2017 COLT COLT 2017

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Abstract

We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz and semi-Lipschitz losses with regret bounds improving on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — lipschitz loss

🐣 Hot Topic Early Bird — contextual bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Nicolò Cesa-bianchi , Pierre Gaillard , Claudio Gentile , Sébastien Gerchinovitz

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms

Keywords

partial feedback minimax regret contextual bandit lipschitz loss online nonparametric learning algorithmic chaining

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017