SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Eugene Ie; Vihan Jain; Jing Wang; Sanmit Narvekar; Ritesh Agarwal; Rui Wu; Heng-Tze Cheng; Tushar Chandra; Craig Boutilier

2019 IJCAI IJCAI 2019

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Abstract

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — slate recommendation

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — temporal difference learning

Authors

Eugene Ie , Vihan Jain , Jing Wang , Sanmit Narvekar , Ritesh Agarwal , Rui Wu , Heng-Tze Cheng , Tushar Chandra , Craig Boutilier

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Application Areas > Risk Management Reinforcement Learning > Methods > Deep RL

Keywords

temporal difference learning user modeling action space slate recommendation value decomposition long-term engagement

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019