Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

David Klaška; Antonín Kučera; Vojtěch Kůr; Vít Musil; Vojtěch Řehák

2024 AAAI AAAI 2024

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

Abstract

Abstract Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability in the sense that the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — long-run average objective

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

David Klaška , Antonín Kučera , Vojtěch Kůr , Vít Musil , Vojtěch Řehák

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Optimization Artificial Intelligence > Core AI > Decision Making

Keywords

policy optimization markov decision process long-run average objective steady-state behavior local stability limit frequency policy construction long-run average

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024