Adversarially Regularized Policy Learning Guided by Trajectory Optimization

Zhigen Zhao; Simiao Zuo; Tuo Zhao; Ye Zhao

2022 L4DC L4DC 2022

Adversarially Regularized Policy Learning Guided by Trajectory Optimization

Abstract

Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems. Despite their great flexibility, the large neural networks for parameterizing control policies impose significant challenges. The learned neural control policies are often overcomplex and non-smooth, which can easily cause unexpected or diverging robot motions. Therefore, they often yield poor generalization performance in practice. To address this issue, we propose adversarially regularized policy learning guided by trajectory optimization (VERONICA) for learning smooth control policies. Specifically, our proposed approach controls the smoothness (local Lipschitz continuity) of the neural control policies by stabilizing the output control with respect to the worst-case perturbation to the input state. Our experiments on robot manipulation show that our proposed approach not only improves the sample efficiency of neural policy learning but also enhances the robustness of the policy against various types of disturbances, including sensor noise, environmental uncertainty, and model mismatch.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning and Robotics

🧭 Keyword Pioneer — smooth control policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Zhigen Zhao , Simiao Zuo , Tuo Zhao , Ye Zhao

Topics

Machine Learning > Learning Types > Adversarial Learning Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Robotics Artificial Intelligence > Core AI > Adversarial Learning

Keywords

sample efficiency policy learning trajectory optimization robot manipulation adversarial regularization neural network policy smoothness regularization neural control policy smooth control policy

Download PDF

Related papers

Learning-Enabled Robust Control with Noisy Measurements 2022

Input-to-State Stable Neural Ordinary Differential Equations with Applications to Transient Modeling of Circuits 2022

Data-Driven Controller Synthesis of Unknown Nonlinear Polynomial Systems via Control Barrier Certificates 2022

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks 2022

On the Effectiveness of Iterative Learning Control 2022