Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Shane Griffith; Kaushik Subramanian; Jonathan Scholz; Charles L Isbell; Andrea L Thomaz

2013 NIPS NeurIPS 2013

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Abstract

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

📈 Trend Setter — Human-AI Interaction

🧭 Keyword Pioneer — interactive reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Shane Griffith , Kaushik Subramanian , Jonathan Scholz , Charles L Isbell , Andrea L Thomaz

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Human-AI Interaction Artificial Intelligence > Bayesian & Probabilistic > Bayesian Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Bayesian & Probabilistic > Bayesian Inference Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning preference learning bayesian approach interactive reinforcement learning human feedback policy shaping reward signal

Download PDF

Related papers

Latent Structured Active Learning 2013

On Flat versus Hierarchical Classification in Large-Scale Taxonomies 2013

Generalized Method-of-Moments for Rank Aggregation 2013

Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections 2013

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent 2013