Generic Exploration and K-armed Voting Bandits

Tanguy Urvoy; Fabrice Clérot; Raphaël Féraud; Sami Naamane

2013 ICML ICML 2013

Generic Exploration and K-armed Voting Bandits

Abstract

We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.

🚀 Conference Pioneer — ICML 2013

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — dueling bandit

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

📈 Trend Setter — Game Theory

Authors

Tanguy Urvoy , Fabrice Clérot , Raphaël Féraud , Sami Naamane

Topics

Machine Learning > Learning Types > Active Learning Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Multi-Armed Bandits Artificial Intelligence > Core AI > Game Theory

Keywords

partial feedback multi-armed bandit pure exploration dueling bandit stochastic online learning voting bandit

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Robust Structural Metric Learning 2013

Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression 2013