Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning

Tadashi Kozuno; Eiji Uchibe; Kenji Doya

2019 AISTATS AISTATS 2019

Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning

Abstract

In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — gap-increasing operator

🐣 Hot Topic Early Bird — theoretical analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tadashi Kozuno , Eiji Uchibe , Kenji Doya

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning Reinforcement Learning > Methods > Value Iteration Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning policy learning theoretical analysis value iteration softmax operator gap-increasing operator

Download PDF

Related papers

Inferring Multidimensional Rates of Aging from Cross-Sectional Data 2019

On the Interaction Effects Between Prediction and Clustering 2019

Efficient Linear Bandits through Matrix Sketching 2019

An Optimal Algorithm for Stochastic Three-Composite Optimization 2019

Efficient Inference in Multi-task Cox Process Models 2019