2021
AAAI
AAAI 2021
Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits
Abstract
Abstract Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning
🧭
Keyword Pioneer
— explore-exploit trade-off
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Optimization & Theory > Learning Theory
Machine Learning > Optimization & Theory > Stochastic Processes
Machine Learning > Optimization & Theory > Online Algorithms
Machine Learning > Learning Types > Multi-Armed Bandits
Machine Learning > Learning Types > Exploration-Exploitation
Artificial Intelligence > Core AI > Decision Making