Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

Ashley Edwards; Himanshu Sahni; Rosanne Liu; Jane Hung; Ankit Jain; Rui Wang; Adrien Ecoffet; Thomas Miconi; Charles Isbell; Jason Yosinski

2020 ICML ICML 2020

Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

Abstract

In this paper, we introduce a novel form of value function, $Q(s, s’)$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s’$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — optimal policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Ashley Edwards , Himanshu Sahni , Rosanne Liu , Jane Hung , Ankit Jain , Rui Wang , Adrien Ecoffet , Thomas Miconi , Charles Isbell , Jason Yosinski

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

deep reinforcement learning value function off-policy learning optimal policy dynamics model

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020