From Past to Future: Rethinking Eligibility Traces

Dhawal Gupta; Scott M. Jordan; Shreyas Chaudhari; Bo Liu; Philip S. Thomas; Bruno Castro da Silva

2024 AAAI AAAI 2024

From Past to Future: Rethinking Eligibility Traces

Abstract

Abstract In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the ????????????? ????? ????????. Unlike traditional state value functions, bidirectional value functions account for both future expected returns (rewards anticipated from the current state onward) and past expected returns (cumulative rewards from the episode's start to the present). We derive principled update equations to learn this value function and, through experimentation, demonstrate its efficacy in enhancing the process of policy evaluation. In particular, our results indicate that the proposed learning approach can, in certain challenging contexts, perform policy evaluation more rapidly than TD(λ)–a method that learns forward value functions, v^π, ????????. Overall, our findings present a new perspective on eligibility traces and potential advantages associated with the novel value function it inspires, especially for policy evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — bidirectional value function

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Dhawal Gupta , Scott M. Jordan , Shreyas Chaudhari , Bo Liu , Philip S. Thomas , Bruno Castro da Silva

Topics

Artificial Intelligence > Core AI > Planning Reinforcement Learning > Methods > Deep RL Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

temporal difference learning policy evaluation credit assignment eligibility trace bidirectional value function

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024