On the Expressivity of Markov Reward (Extended Abstract)

David Abel; Will Dabney; Anna Harutyunyan; Mark K. Ho; Michael L. Littman; Doina Precup; Satinder Singh

2022 IJCAI IJCAI 2022

On the Expressivity of Markov Reward (Extended Abstract)

Abstract

Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task": (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to perform each task type, and correctly determine when no such reward function exists.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — reward expressivity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

David Abel , Will Dabney , Anna Harutyunyan , Mark K. Ho , Michael L. Littman , Doina Precup , Satinder Singh

Topics

Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Policy Learning

Keywords

regret bound task specification markov reward reward expressivity

Download PDF

Related papers

Better Collective Decisions via Uncertainty Reduction 2022

Mixed Strategies for Security Games with General Defending Requirements 2022

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations 2022

Distortion in Voting with Top-t Preferences 2022

Let’s Agree to Agree: Targeting Consensus for Incomplete Preferences through Majority Dynamics 2022