Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Min Xu; Tao Qin; Tie-yan Liu

2013 NIPS NeurIPS 2013

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Abstract

In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning and Reinforcement Learning

📈 Trend Setter — Multi-Agent Systems

🧭 Keyword Pioneer — estimation bia

🐣 Hot Topic Early Bird — multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Min Xu , Tao Qin , Tie-yan Liu

Topics

Machine Learning > Learning Types > Active Learning Reinforcement Learning > Methods > Multi-Agent Systems Data Science & Analytics > Applications > Recommender Systems Data Science & Analytics > Applications > Information Retrieval Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning search advertising multi-armed bandit bandit algorithm estimation bia sample selection bia expected revenue

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013