Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Luisa M Zintgraf; Leo Feng; Cong Lu; Maximilian Igl; Kristian Hartikainen; Katja Hofmann; Shimon Whiteson

2021 ICML ICML 2021

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Abstract

To rapidly learn a new task, it is often essential for agents to explore efficiently - especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent’s task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — hyper-state space

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Luisa M Zintgraf , Leo Feng , Cong Lu , Maximilian Igl , Kristian Hartikainen , Katja Hofmann , Shimon Whiteson

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Reinforcement Learning > Methods > Deep RL

Keywords

sparse reward exploration bonus meta reinforcement learning task adaptation hyper-state space

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021