Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Yifan Lin; Enlu Zhou

2025 AAAI AAAI 2025

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Abstract

Abstract We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach called approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the BR-MDP formulation and the proposed algorithm.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yifan Lin , Enlu Zhou

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Risk Management Artificial Intelligence > Bayesian & Probabilistic > Bayesian Inference Machine Learning > Bayesian & Probabilistic > Bayesian Inference Mathematics & Optimization > Optimization > Robust Optimization Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

convex optimization bayesian inference markov decision process distributionally robust bayesian risk risk measure bilevel programming

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025