Gaussian Process Bandits for Top-k Recommendations

Mohit Yadav; Daniel Sheldon; Cameron Musco

2024 NIPS NeurIPS 2024

Gaussian Process Bandits for Top-k Recommendations

Abstract

Algorithms that utilize bandit feedback to optimize top-k recommendations are vital for online marketplaces, search engines, and content platforms. However, the combinatorial nature of this problem poses a significant challenge, as the possible number of ordered top-k recommendations from $n$ items grows exponentially with $k$. As a result, previous work often relies on restrictive assumptions about the reward or bandit feedback models, such as assuming that the feedback discloses rewards for each recommended item rather than a single scalar feedback for the entire set of top-k recommendations. We introduce a novel contextual bandit algorithm for top-k recommendations, leveraging a Gaussian process with a Kendall kernel to model the reward function.Our algorithm requires only scalar feedback from the top-k recommendations and does not impose restrictive assumptions on the reward structure. Theoretical analysis confirms that the proposed algorithm achieves sub-linear regret in relation to the number of rounds and arms. Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — top-k recommendation

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

Authors

Mohit Yadav , Daniel Sheldon , Cameron Musco

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Core Methods > Regression Data Science & Analytics > Applications > Recommender Systems Mathematics & Optimization > Optimization > Stochastic Methods Machine Learning > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Application Areas > Recommender Systems Machine Learning > Bayesian & Probabilistic > Gaussian Processes

Keywords

gaussian process sub-linear regret multi-armed bandit regret bound contextual bandit sublinear regret bandit algorithm recommendation system recommender system gaussian process bandit top-k recommendation kendall kernel

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024