Optimal and Robust Price Experimentation: Learning by Lottery

Christopher Dance; Onno Zoeter

2011 AISTATS AISTATS 2011

Optimal and Robust Price Experimentation: Learning by Lottery

Abstract

This paper studies optimal price learning for one or more items. We introduce the Schrödinger price experiment (SPE) which superimposes classical price experiments using lotteries, and thereby extracts more information from each customer interaction. If buyers are perfectly rational we show that there exist SPEs that in the limit of infinite superposition learn optimally and exploit optimally. We refer to the new resulting mechanism as the hopeful mechanism (HM) since although it is incentive compatible, buyers can deviate with extreme consequences for the seller at very little cost to themselves. For real-world settings we propose a robust version of the approach which takes the form of a Markov decision process where the actions are functions. We provide approximate policies motivated by the best of sampled set (BOSS) algorithm coupled with approximate Bayesian inference. Numerical studies show that the proposed method significantly increases seller revenue compared to classical price experimentation, even for the single-item case.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

📈 Trend Setter — Game Theory

🧭 Keyword Pioneer — incentive compatibility

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Christopher Dance , Onno Zoeter

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Game Theory Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

bayesian inference markov decision process multi-armed bandit revenue optimization incentive compatibility price experimentation optimal learning

Download PDF

Related papers

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm 2011

Deep Learners Benefit More from Out-of-Distribution Examples 2011

Bagged Structure Learning of Bayesian Network 2011

Convergent Decomposition Solvers for Tree-reweighted Free Energies 2011

Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization 2011