Learning values across many orders of magnitude

Hado P van Hasselt; Arthur Guez; Matteo Hessel; Volodymyr Mnih; David Silver

2016 NIPS NeurIPS 2016

Learning values across many orders of magnitude

Abstract

Most learning algorithms are not invariant to the scale of the signal that is being approximated. We propose to adaptively normalize the targets used in the learning updates. This is important in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — adaptive normalization

🐣 Hot Topic Early Bird — deep reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Hado P van Hasselt , Arthur Guez , Matteo Hessel , Volodymyr Mnih , David Silver

Topics

Machine Learning > Core Methods > Regression Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Value Iteration

Keywords

deep reinforcement learning atari game adaptive normalization value-based reinforcement learning reward scaling

Download PDF

Related papers

Bayesian Intermittent Demand Forecasting for Large Inventories 2016

Dynamic Network Surgery for Efficient DNNs 2016

Beyond Exchangeability: The Chinese Voting Process 2016

Safe and Efficient Off-Policy Reinforcement Learning 2016

Tagger: Deep Unsupervised Perceptual Grouping 2016