Muesli: Combining Improvements in Policy Optimization

Matteo Hessel; Ivo Danihelka; Fabio Viola; Arthur Guez; Simon Schmitt; Laurent Sifre; Theophane Weber; David Silver; Hado van Hasselt

2021 ICML ICML 2021

Muesli: Combining Improvements in Policy Optimization

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero’s state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

🧭 Keyword Pioneer — regularized policy

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Matteo Hessel , Ivo Danihelka , Fabio Viola , Arthur Guez , Simon Schmitt , Laurent Sifre , Theophane Weber , David Silver , Hado van Hasselt

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Game AI

Keywords

reinforcement learning policy optimization value iteration continuous control model-based reinforcement learning model learning regularized policy model-free baseline

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021