Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Rituraj Kaushik; Konstantinos Chatzilygeroudis; Jean-Baptiste Mouret

2018 CORL CoRL 2018

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Abstract

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the cumulative reward and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — pareto optimization

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

📈 Trend Setter — Multi-Objective Optimization

🐣 Hot Topic Early Bird — multi-objective optimization

Authors

Rituraj Kaushik , Konstantinos Chatzilygeroudis , Jean-Baptiste Mouret

Topics

Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Multi-Task Learning Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Multi-Objective Optimization

Keywords

reinforcement learning multi-objective optimization sparse reward pareto optimization data-efficient learning model-based policy search

Download PDF

Related papers

Batch Active Preference-Based Learning of Reward Functions 2018

Personalized Dynamics Models for Adaptive Assistive Navigation Systems 2018

Neural Modular Control for Embodied Question Answering 2018

Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents 2018

Deep Drone Racing: Learning Agile Flight in Dynamic Environments 2018