Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Fei Feng; Ruosong Wang; Wotao Yin; Simon S Du; Lin Yang

2020 NIPS NeurIPS 2020

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Abstract

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

🧭 Keyword Pioneer — tabular rl

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fei Feng , Ruosong Wang , Wotao Yin , Simon S Du , Lin Yang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Optimization & Theory > Learning Theory

Keywords

unsupervised learning reinforcement learning sample complexity latent state tabular rl

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020