ε-MDPs: Learning in Varying Environments

István Szita; Bálint Takács; Andras Lorincz

2002 JMLR JMLR 2002

ε-MDPs: Learning in Varying Environments

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem. [abs] [pdf] [ps.gz] [ps] [html]

📈 Trend Setter — Policy Learning

🧭 Keyword Pioneer — near-optimal policy

🐣 Hot Topic Early Bird — markov decision process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

István Szita , Bálint Takács , Andras Lorincz

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Value Iteration

Keywords

markov decision process near-optimal policy varying environment

Download PDF

Related papers

Kernel Independent Component Analysis 2002

Memory-Based Shallow Parsing 2002

Covering Number Bounds of Certain Regularized Linear Function Classes 2002

On the Convergence of Optimistic Policy Iteration 2002

The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces 2002