Online Learning with Sample Path Constraints

Shie Mannor; John N. Tsitsiklis; Jia Yuan Yu

2009 JMLR JMLR 2009

Online Learning with Sample Path Constraints

Abstract

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem. [abs] [ pdf ][ bib ] © JMLR 2009. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — reward maximization

🐣 Hot Topic Early Bird — stochastic optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shie Mannor , John N. Tsitsiklis , Jia Yuan Yu

Topics

Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms

Keywords

stochastic optimization online learning reward maximization convex hull calibrated forecasting sample path constraint

Download PDF

Related papers

Subgroup Analysis via Recursive Partitioning 2009

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization 2009

An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs 2009

Nonextensive Information Theoretic Kernels on Measures 2009

The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models 2009