LSTD with Random Projections

Mohammad Ghavamzadeh; Alessandro Lazaric; Odalric Maillard; Rémi Munos

2010 NIPS NeurIPS 2010

LSTD with Random Projections

Abstract

We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a high-dimensional space. We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm. We also show how the error of LSTD with random projections is propagated through the iterations of a policy iteration algorithm and provide a performance bound for the resulting least-squares policy iteration (LSPI) algorithm.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — least-squares temporal difference

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

📈 Trend Setter — Value Iteration

Authors

Mohammad Ghavamzadeh , Alessandro Lazaric , Odalric Maillard , Rémi Munos

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Core Methods > Dimensionality Reduction Machine Learning > Learning Types > Reinforcement Learning Reinforcement Learning > Methods > Value Iteration Deep Learning > Optimization & Theory > Stochastic Methods

Keywords

reinforcement learning temporal difference learning function approximation policy iteration least-squares temporal difference random projection stochastic method high-dimensional space

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010