2002 JMLR JMLR 2002

ε-MDPs: Learning in Varying Environments

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem. [abs] [pdf] [ps.gz] [ps] [html]

📈 Trend Setter — Policy Learning
🧭 Keyword Pioneer — near-optimal policy
🐣 Hot Topic Early Bird — markov decision process
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy