Selecting the State-Representation in Reinforcement Learning

Odalric-ambrym Maillard; Daniil Ryabko; Rémi Munos

2011 NIPS NeurIPS 2011

Selecting the State-Representation in Reinforcement Learning

Abstract

The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T^{2/3} where T is the horizon time.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — state representation learning

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Odalric-ambrym Maillard , Daniil Ryabko , Rémi Munos

Topics

Machine Learning > Core Methods > Classification Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning model selection online learning markov decision process state representation optimal policy state representation learning regret bound

Download PDF

Related papers

Co-Training for Domain Adaptation 2011

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning 2011

Learning to Agglomerate Superpixel Hierarchies 2011

A Reinforcement Learning Theory for Homeostatic Regulation 2011

A Global Structural EM Algorithm for a Model of Cancer Progression 2011