2012
NIPS
NeurIPS 2012
Regularized Off-Policy TD-Learning
Abstract
We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.
🌉
Interdisciplinary Bridge
— Machine Learning and Reinforcement Learning
🧭
Keyword Pioneer
— off-policy td-learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio
📈
Trend Setter
— Offline RL
🐣
Hot Topic Early Bird
— reinforcement learning
Authors
Topics
Machine Learning > Core Methods > Representation Learning
Machine Learning > Optimization & Theory > Optimization
Reinforcement Learning > Methods > Deep RL
Reinforcement Learning > Methods > Offline RL
Machine Learning > Learning Types > Reinforcement Learning
Machine Learning > Core Methods > Optimization