Reinforcement learning with value advice

Mayank Daswani; Peter Sunehag; Marcus Hutter

2014 ACML ACML 2014

Reinforcement learning with value advice

Abstract

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.

📈 Trend Setter — Game AI

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Mayank Daswani , Peter Sunehag , Marcus Hutter

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Game AI

Keywords

reinforcement learning imitation learning policy learning

Download PDF

Related papers

Support vector machines with indefinite kernels 2014

Sample Distillation for Object Detection and Image Classification 2014

Efficient Sample Mining for Object Detection 2014

Ensembles for Time Series Forecasting 2014

Polya-gamma augmentations for factor models 2014