An Analytical Study of Utility Functions in Multi-Objective Reinforcement Learning

Manel Rodriguez-Soto; Juan A. Rodríguez-aguilar; Maite Lopez-Sanchez

2024 NIPS NeurIPS 2024

An Analytical Study of Utility Functions in Multi-Objective Reinforcement Learning

Abstract

Multi-objective reinforcement learning (MORL) is an excellent framework for multi-objective sequential decision-making. MORL employs a utility function to aggregate multiple objectives into one that expresses a user's preferences. However, MORL still misses two crucial theoretical analyses of the properties of utility functions: (1) a characterisation of the utility functions for which an associated optimal policy exists, and (2) a characterisation of the types of preferences that can be expressed as utility functions. As a result, we formally characterise the families of preferences and utility functions that MORL should focus on: those for which an optimal policy is guaranteed to exist. We expect our theoretical results to promote the development of novel MORL algorithms that exploit our theoretical findings.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — preference characterization

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Manel Rodriguez-Soto , Juan A. Rodríguez-aguilar , Maite Lopez-Sanchez

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Mathematics & Optimization > Optimization > Multi-Objective Optimization Machine Learning > Learning Types > Multi-Objective Optimization Artificial Intelligence > Core AI > Decision Making

Keywords

sequential decision-making optimal policy preference aggregation utility function multi-objective reinforcement learning preference characterization preference characterisation

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024