Scalar Posterior Sampling with Applications

Georgios Theocharous; Zheng Wen; Yasin Abbasi Yadkori; Nikos Vlassis

2018 NIPS NeurIPS 2018

Scalar Posterior Sampling with Applications

Abstract

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — bayesian regret bound

🐣 Hot Topic Early Bird — posterior sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Georgios Theocharous , Zheng Wen , Yasin Abbasi Yadkori , Nikos Vlassis

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Deep RL Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

reinforcement learning online learning posterior sampling bayesian regret sequential recommendation bayesian regret bound

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018