Linear Multi-Resource Allocation with Semi-Bandit Feedback

Tor Lattimore; Koby Crammer; Csaba Szepesvári

2015 NIPS NeurIPS 2015

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Abstract

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly out-performs existing work, especially in the sparse case.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Online Learning

🧭 Keyword Pioneer — stochastic linear bandit

🐣 Hot Topic Early Bird — resource allocation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Tor Lattimore , Koby Crammer , Csaba Szepesvári

Topics

Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Paradigms > Online Learning Machine Learning > Learning Types > Exploration-Exploitation

Keywords

regret analysis resource allocation multi-armed bandit heteroscedastic noise semi-bandit feedback stochastic linear bandit

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015