Rating-Based Reinforcement Learning

Devin White; Mingkang Wu; Ellen Novoseller; Vernon J. Lawhern; Nicholas Waytowich; Yongcan Cao

2024 AAAI AAAI 2024

Rating-Based Reinforcement Learning

Abstract

Abstract This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — trajectory evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Devin White , Mingkang Wu , Ellen Novoseller , Vernon J. Lawhern , Nicholas Waytowich , Yongcan Cao

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Human-AI Interaction Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Imitation Learning

Keywords

reinforcement learning reward modeling preference learning policy learning multi-class classification human feedback preference-based learning human rating trajectory evaluation multi-class loss trajectory rating

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024