2018
NIPS
NeurIPS 2018
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Abstract
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
🌉
Interdisciplinary Bridge
— Machine Learning and Mathematics & Optimization
🧭
Keyword Pioneer
— assortment selection
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Mathematics & Optimization > Optimization > Combinatorial Optimization
Mathematics & Optimization > Optimization > Stochastic Methods
Mathematics & Optimization > Optimization > Online Algorithms
Machine Learning > Learning Types > Reinforcement Learning
Machine Learning > Learning Types > Multi-Armed Bandits