Learning Routines for Effective Off-Policy Reinforcement Learning

Edoardo Cetin; Oya Celiktutan

2021 ICML ICML 2021

Learning Routines for Effective Off-Policy Reinforcement Learning

Abstract

The performance of reinforcement learning depends upon designing an appropriate action space, where the effect of each action is measurable, yet, granular enough to permit flexible behavior. So far, this process involved non-trivial user choices in terms of the available actions and their execution frequency. We propose a novel framework for reinforcement learning that effectively lifts such constraints. Within our framework, agents learn effective behavior over a routine space: a new, higher-level action space, where each routine represents a set of ’equivalent’ sequences of granular actions with arbitrary length. Our routine space is learned end-to-end to facilitate the accomplishment of underlying off-policy reinforcement learning objectives. We apply our framework to two state-of-the-art off-policy algorithms and show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode, improving computational efficiency.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — routine learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Edoardo Cetin , Oya Celiktutan

Topics

Machine Learning > Application Areas > Efficient Computing Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning

Keywords

policy optimization policy learning computational efficiency off-policy reinforcement learning end-to-end learning action space routine learning

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021