On learning history-based policies for controlling Markov decision processes

Gandharv Patil; Aditya Mahajan; Doina Precup

2024 AISTATS AISTATS 2024

On learning history-based policies for controlling Markov decision processes

Abstract

Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — history-based policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Gandharv Patil , Aditya Mahajan , Doina Precup

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reasoning

Keywords

reinforcement learning state abstraction function approximation markov decision process recurrent neural network partially observable mdp history-based policy

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024