Adaptive Hedge

Tim V. Erven; Wouter M. Koolen; Steven D. Rooij; Peter Grünwald

2011 NIPS NeurIPS 2011

Adaptive Hedge

Abstract

Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — hedge algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

🐣 Hot Topic Early Bird — regret minimization

Authors

Tim V. Erven , Wouter M. Koolen , Steven D. Rooij , Peter Grünwald

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning hedge algorithm regret minimization adaptive learning rate decision-theoretic learning learning rate adaptation decision-theoretic online learning multi-armed bandit learning rate regret bound adaptive algorithm

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011