Faster Deep Reinforcement Learning with Slower Online Network

Kavosh Asadi; Rasool Fakoor; Omer Gottesman; Taesup Kim; Michael L. Littman; Alexander J Smola

2022 NIPS NeurIPS 2022

Faster Deep Reinforcement Learning with Slower Online Network

Abstract

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network. This improves the robustness of deep reinforcement learning in presence of noisy updates. The resultant agents, called DQN Pro and Rainbow Pro, exhibit significant performance improvements over their original counterparts on the Atari benchmark demonstrating the effectiveness of this simple idea in deep reinforcement learning. The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — value function optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Kavosh Asadi , Rasool Fakoor , Omer Gottesman , Taesup Kim , Michael L. Littman , Alexander J Smola

Topics

Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Deep Learning > Optimization & Theory > Optimization

Keywords

deep reinforcement learning value function target network value function optimization

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022