Smooth Exploration for Robotic Reinforcement Learning

Antonin Raffin; Jens Kober; Freek Stulp

2021 CORL CoRL 2021

Smooth Exploration for Robotic Reinforcement Learning

Abstract

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE enables a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance.

🧭 Keyword Pioneer — state-dependent exploration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Antonin Raffin , Jens Kober , Freek Stulp

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

deep reinforcement learning continuous control robotic control state-dependent exploration exploration noise

Download PDF

Related papers

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding 2021

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo 2021

Taskography: Evaluating robot task planning over large 3D scene graphs 2021

Parallelised Diffeomorphic Sampling-based Motion Planning 2021

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning 2021