2026 AAAI AAAI 2026

SHADOW: Dynamic-Aware Credit Assignment Against Long-Horizon Tasks

Abstract

Abstract Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model (LLM) agents to solve complex, multi-step tasks through environmental interaction. A fundamental challenge in such long-horizon scenarios is credit assignment, as delayed rewards provide inadequate signals for evaluating individual action contributions. Existing methods typically neglect trajectory transition dynamics, which leads to coarse-grained or biased credit assignment. To address these limitations, we introduce SHADOW, a novel framework that systematically incorporates transition dynamics for improved credit assignment. Our framework makes two primary contributions: (i) a dynamics-aware state grouping mechanism that mitigates misleading action comparisons between dynamically inconsistent states, and (ii) a local dynamic advantage estimator that leverages Generalized Advantage Estimation (GAE) to precisely quantify individual action contributions through a fine-grained analysis of transition patterns. Comprehensive experiments conducted with the Qwen2.5-1.5/7B-Instruct agent model demonstrate that our method achieves success rate improvements of 9.4%/7.6% on the ALFworld benchmark and a performance gain of over 5% on WebShop.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio