Improving Policy Gradient Estimates with Influence Information

Jervis Pinto; Alan Fern; Tim Bauer; Martin Erwig

2011 ACML ACML 2011

Improving Policy Gradient Estimates with Influence Information

Abstract

In reinforcement learning (RL) it is often possible to obtain sound, but incomplete, information about influences and independencies among problem variables and rewards, even when an exact domain model is unknown. For example, such information can be computed based on a partial, qualitative domain model, or via domain-specific analysis techniques. While, intuitively, such information appears useful for RL, there are no algorithms that incorporate it in a sound way. In this work, we describe how to leverage such information for improving the estimation of policy gradients, which can be used to speedup gradient-based RL. We prove general conditions under which our estimator is unbiased and show that it will typically have reduced variance compared to standard unbiased gradient estimates. We evaluate the approach in the domain of Adaptation-Based Programming where RL is used to optimize the performance of programs and independence information can be computed via standard program analysis techniques. Incorporating independence information produces a large speedup in learning on a variety of adaptive programs.

🌱 Topic Pioneer — Statistics

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

📈 Trend Setter — Reinforcement Learning

🧭 Keyword Pioneer — adaptation-based programming

Authors

Jervis Pinto , Alan Fern , Tim Bauer , Martin Erwig

Topics

Reinforcement Learning Reinforcement Learning > Methods > Policy Learning Machine Learning > Optimization & Theory > Statistics

Keywords

reinforcement learning independent component analysis policy gradient gradient estimation monte carlo estimation variance reduction adaptation-based programming

Download PDF

Related papers

Nonlinear Online Classification Algorithm with Probability Margin 2011

Approximate Model Selection for Large Scale LSSVM 2011

Learning Rules from Incomplete Examples via Implicit Mention Models 2011

Estimating Diffusion Probability Changes for AsIC-SIS Model from Information Diffusion Results 2011

Summarization of Yes/No Questions Using a Feature Function Model 2011