Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Thiago D. Simão; Matthijs T. J. Spaan

2019 AAAI AAAI 2019

Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Abstract

Abstract We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is known to make good use of the data provided. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm.

🚀 Conference Pioneer — AAAI 2019

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — baseline bootstrapping

🐣 Hot Topic Early Bird — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Thiago D. Simão , Matthijs T. J. Spaan

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > AI Safety Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning

Keywords

sample efficiency sample complexity policy improvement safe reinforcement learning safe policy improvement confidence level factored dynamics baseline bootstrapping factored reinforcement learning

Download PDF

Related papers

Cooperative Multimodal Approach to Depression Detection in Twitter 2019

Learning to Align Question and Answer Utterances in Customer Service Conversation with Recurrent Pointer Networks 2019

Community Detection in Social Networks Considering Topic Correlations 2019

Session-Based Recommendation with Graph Neural Networks 2019

Blameworthiness in Multi-Agent Settings 2019