Online Learning under Delayed Feedback

Pooria Joulani; András György; Csaba Szepesvári

2013 ICML ICML 2013

Online Learning under Delayed Feedback

Abstract

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.

🚀 Conference Pioneer — ICML 2013

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — delayed feedback

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Pooria Joulani , András György , Csaba Szepesvári

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms

Keywords

regret analysis online learning adversarial setting multi-armed bandit regret bound delayed feedback bandit algorithm stochastic setting

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Generic Exploration and K-armed Voting Bandits 2013

Robust Structural Metric Learning 2013