Regularized Off-Policy TD-Learning

Bo Liu; Sridhar Mahadevan; Ji Liu

2012 NIPS NeurIPS 2012

Regularized Off-Policy TD-Learning

Abstract

We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — off-policy td-learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

📈 Trend Setter — Offline RL

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Bo Liu , Sridhar Mahadevan , Ji Liu

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Core Methods > Optimization

Keywords

reinforcement learning temporal difference learning convex optimization feature selection sparse representation l1 regularization value function off-policy learning off-policy td-learning convex-concave saddle-point first-order solvers

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012