Deep Value Model Predictive Control

David Hoeller; Farbod Farshidian; Marco Hutter

2019 CORL CoRL 2019

Deep Value Model Predictive Control

Abstract

In this paper, we introduce an actor-critic algorithm called Deep Value Model Predictive Control (DMPC), which combines model-based trajectory optimization with value function estimation. The DMPC actor is a Model Predictive Control (MPC) optimizer with an objective function defined in terms of a value function estimated by the critic. We show that our MPC actor is an importance sampler, which minimizes an upper bound of the cross-entropy to the state distribution of the optimal sampling policy. In our experiments with a Ballbot system, we show that our algorithm can work with sparse and binary reward signals to efficiently solve obstacle avoidance and target reaching tasks. Compared to previous work, we show that including the value function in the running cost of the trajectory optimizer speeds up the convergence. We also discuss the necessary strategies to robustify the algorithm in practice.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Mathematics & Optimization and Reinforcement Learning and Robotics

🐣 Hot Topic Early Bird — model predictive control

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

David Hoeller , Farbod Farshidian , Marco Hutter

Topics

Artificial Intelligence > Core AI > Planning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Robotics > Capabilities > Motion Planning Mathematics & Optimization > Optimization > Optimal Control Deep Learning > Learning Types > Reinforcement Learning

Keywords

model predictive control importance sampling value function trajectory optimization actor-critic algorithm sparse reward obstacle avoidance

Download PDF

Related papers

On-Policy Robot Imitation Learning from a Converging Supervisor 2019

Learning by Cheating 2019

Object-centric Forward Modeling for Model Predictive Control 2019

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real 2019

Combining Deep Learning and Verification for Precise Object Instance Detection 2019