Value Pursuit Iteration

Amir M. Farahmand; Doina Precup

2012 NIPS NeurIPS 2012

Value Pursuit Iteration

Abstract

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning and planning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function. We theoretically study VPI and provide a finite-sample error upper bound for it.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — sparse feature approximation

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Amir M. Farahmand , Doina Precup

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning Reinforcement Learning > Methods > Value Iteration Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning policy optimization function approximation sparse representation value function value iteration sparse approximation approximate dynamic programming sparse feature approximation nonparametric algorithm feature dictionary

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012