Bayesian RL for Goal-Only Rewards

Philippe Morere; Fabio Ramos

2018 CORL CoRL 2018

Bayesian RL for Goal-Only Rewards

Abstract

We address the challenging problem of reinforcement learning under goal-only rewards [1], where rewards are only non-zero when the goal is achieved. This reward definition alleviates the need for cumbersome reward engineering, making the reward formulation trivial. Classic exploration heuristics such as Boltzmann or epsilon-greedy exploration are highly inefficient in domains with goal-only rewards. We solve this problem by leveraging value function posterior variance information to direct exploration where uncertainty is higher. The proposed algorithm (EMU-Q) achieves data-efficient exploration, and balances exploration and exploitation explicitly at a policy level granting users more control over the learning process. We introduce general features approximating kernels, allowing to greatly reduce the algorithm complexity from O(N^3) in the number of transitions to O(M^2) in the number of features. We demonstrate EMU-Q is competitive with other exploration techniques on a variety of continuous control tasks and on a robotic manipulator.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — value function

Authors

Philippe Morere , Fabio Ramos

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Bayesian & Probabilistic > Bayesian Learning Machine Learning > Learning Types > Reinforcement Learning

Keywords

bayesian reinforcement learning value function continuous control exploration strategy

Download PDF

Related papers

Batch Active Preference-Based Learning of Reward Functions 2018

Personalized Dynamics Models for Adaptive Assistive Navigation Systems 2018

Neural Modular Control for Embodied Question Answering 2018

Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents 2018

Deep Drone Racing: Learning Agile Flight in Dynamic Environments 2018