A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Mateo Perez; Fabio Somenzi; Ashutosh Trivedi

2024 AAAI AAAI 2024

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Abstract

Abstract Linear temporal logic (LTL) and omega-regular objectives---a superset of LTL---have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Knowledge & Reasoning and Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — epsilon-recurrence time

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Mateo Perez , Fabio Somenzi , Ashutosh Trivedi

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Policy Learning Knowledge & Reasoning > Reasoning > Automated Planning Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reasoning Mathematics & Optimization > Optimization > Game Theory

Keywords

model-based learning pac learning formal verification markov decision process temporal logic linear temporal logic omega-regular objective epsilon-recurrence time

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024