2017
IJCAI
IJCAI 2017
On Thompson Sampling and Asymptotic Optimality
Abstract
We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.
🌉
Interdisciplinary Bridge
— Machine Learning and Reinforcement Learning
🧭
Keyword Pioneer
— nonparametric reinforcement learning
🐣
Hot Topic Early Bird
— thompson sampling
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio