Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Thiago D. Simão

2019 IJCAI IJCAI 2019

Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Abstract

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — safe reinforcement learning

Authors

Thiago D. Simão

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Optimization & Theory > Optimization Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning sample efficiency markov decision process batch reinforcement learning policy improvement safe reinforcement learning exploration strategy sample-efficient learning safe policy improvement

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019