Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Tyler Westenbroek; Jacob Levy; David Fridovich-Keil

2023 CORL CoRL 2023

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Abstract

We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

🧭 Keyword Pioneer — real-world policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tyler Westenbroek , Jacob Levy , David Fridovich-Keil

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

policy gradient robot learning physics-based model sample-efficient learning real-world policy

Download PDF

Related papers

Stochastic Occupancy Grid Map Prediction in Dynamic Scenes 2023

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning 2023

Robot Parkour Learning 2023

Task-Oriented Koopman-Based Control with Contrastive Encoder 2023

Language-Guided Traffic Simulation via Scene-Level Diffusion 2023