2013
ICML
ICML 2013
Generic Exploration and K-armed Voting Bandits
Abstract
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.
🚀
Conference Pioneer
— ICML 2013
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
🧭
Keyword Pioneer
— dueling bandit
🐣
Hot Topic Early Bird
— multi-armed bandit
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning
📈
Trend Setter
— Game Theory