Robustly Learning Composable Options in Deep Reinforcement Learning

Akhil Bagaria; Jason Senthil; Matthew Slivinski; George Konidaris

2021 IJCAI IJCAI 2021

Robustly Learning Composable Options in Deep Reinforcement Learning

Abstract

Hierarchical reinforcement learning (HRL) is only effective for long-horizon problems when high-level skills can be reliably sequentially executed. Unfortunately, learning reliably composable skills is difficult, because all the components of every skill are constantly changing during learning. We propose three methods for improving the composability of learned skills: representing skill initiation regions using a combination of pessimistic and optimistic classifiers; learning re-targetable policies that are robust to non-stationary subgoal regions; and learning robust option policies using model-based RL. We test these improvements on four sparse-reward maze navigation tasks involving a simulated quadrupedal robot. Each method successively improves the robustness of a baseline skill discovery method, substantially outperforming state-of-the-art flat and hierarchical methods.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

Authors

Akhil Bagaria , Jason Senthil , Matthew Slivinski , George Konidaris

Topics

Artificial Intelligence > Core AI > Agent Systems Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics

Keywords

deep reinforcement learning robot navigation hierarchical reinforcement learning model-based reinforcement learning skill discovery sparse reward skill composition option learning quadrupedal robot

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021