Formally Verified Approximate Policy Iteration

Maximilian Schäffeler; Mohammad Abdulaziz

2025 AAAI AAAI 2025

Formally Verified Approximate Policy Iteration

Abstract

Abstract We present a methodology based on interactive theorem proving that facilitates the development of verified implementations of algorithms for solving factored Markov Decision Processes. As a case study, we formally verify an algorithm for approximate policy iteration in the proof assistant Isabelle/HOL. We show how the verified algorithm can be refined to an executable, verified implementation. Our evaluation on benchmark problems shows that it is practical. As part of the development, we build verified software to certify linear programming solutions. We discuss the verification process and the modifications we made to the algorithm during formalization.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Maximilian Schäffeler , Mohammad Abdulaziz

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Theory

Keywords

formal verification markov decision process policy iteration approximate policy iteration factored markov decision process interactive theorem proving

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025