2021
ICML
ICML 2021
Muesli: Combining Improvements in Policy Optimization
Abstract
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero’s state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
🧭
Keyword Pioneer
— regularized policy
🐣
Hot Topic Early Bird
— policy optimization
🐝
Cross-Pollinator
— Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics