2010 NIPS NeurIPS 2010

Error Propagation for Approximate Policy and Value Iteration

Abstract

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum -- as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning
🧭 Keyword Pioneer — bellman residual
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Speech & Audio
📈 Trend Setter — Value Iteration
🐣 Hot Topic Early Bird — value iteration