2019
IJCAI
IJCAI 2019
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
Abstract
Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Reinforcement Learning
🧭
Keyword Pioneer
— slate recommendation
🐝
Cross-Pollinator
— Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics
🐣
Hot Topic Early Bird
— temporal difference learning