Reward-Constrained Behavior Cloning

Zhaorong Wang; Meng Wang; Jingqi Zhang; Yingfeng Chen; Chongjie Zhang

2021 IJCAI IJCAI 2021

Reward-Constrained Behavior Cloning

Abstract

Deep reinforcement learning (RL) has demonstrated success in challenging decision-making/control tasks. However, RL methods, which solve tasks through maximizing the expected reward, may generate undesirable behaviors due to inferior local convergence or incompetent reward design. These undesirable behaviors of agents may not reduce the total reward but destroy the user experience of the application. For example, in the autonomous driving task, the policy actuated by speed reward behaves much more sudden brakes while human drivers generally don’t do that. To overcome this problem, we present a novel method named Reward-Constrained Behavior Cloning (RCBC) which synthesizes imitation learning and constrained reinforcement learning. RCBC leverages human demonstrations to induce desirable or human-like behaviors and employs lower-bound reward constraints for policy optimization to maximize the expected reward. Empirical results on popular benchmark environments show that RCBC learns signiﬁcantly more human-desired policies with performance guarantees which meet the lower-bound reward constraints while performing better than or as well as baseline methods in terms of reward maximization.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — reward constraint

🐣 Hot Topic Early Bird — behavior cloning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhaorong Wang , Meng Wang , Jingqi Zhang , Yingfeng Chen , Chongjie Zhang

Topics

Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Imitation Learning

Keywords

imitation learning policy optimization constrained reinforcement learning behavior cloning reward constraint

Download PDF

Related papers

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard 2021

Guaranteeing Maximin Shares: Some Agents Left Behind 2021

Surprisingly Popular Voting Recovers Rankings, Surprisingly! 2021

Strategyproof Randomized Social Choice for Restricted Sets of Utility Functions 2021

Diversity in Kemeny Rank Aggregation: A Parameterized Approach 2021