On Efficiency in Hierarchical Reinforcement Learning

Zheng Wen; Doina Precup; Morteza Ibrahimi; Andre Barreto; Benjamin Van Roy; Satinder P. Singh

2020 NIPS NeurIPS 2020

On Efficiency in Hierarchical Reinforcement Learning

Abstract

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency. While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the benefits of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efficient HRL methods. Specifically, we formalize the intuition that HRL can exploit well repeating "subMDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efficient, as established through a finite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efficient.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — model-based thompson sampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Zheng Wen , Doina Precup , Morteza Ibrahimi , Andre Barreto , Benjamin Van Roy , Satinder P. Singh

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Learning Paradigms > Meta-Learning Machine Learning > Optimization & Theory > Learning Theory

Keywords

sequential decision making hierarchical reinforcement learning regret bound model-based thompson sampling option planning

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020