2019 AISTATS AISTATS 2019

Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning

Abstract

In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning
📈 Trend Setter — Value Iteration
🧭 Keyword Pioneer — gap-increasing operator
🐣 Hot Topic Early Bird — theoretical analysis
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio