Random Sampling of States in Dynamic Programming

Chris Atkeson; Benjamin Stephens

2007 NIPS NeurIPS 2007

Random Sampling of States in Dynamic Programming

Abstract

We combine two threads of research on approximate dynamic programming: random sampling of states and using local trajectory optimizers to globally optimize a policy and associated value function. This combination allows us to replace a dense multidimensional grid with a much sparser adaptive sampling of states. Our focus is on finding steady state policies for the deterministic time invariant discrete time control problems with continuous states and actions often found in robotics. In this paper we show that we can now solve problems we couldn't solve previously with regular grid-based approaches.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning and Robotics

📈 Trend Setter — Policy Learning

🧭 Keyword Pioneer — random sampling

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Chris Atkeson , Benjamin Stephens

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Robotics > Capabilities > Motion Planning Reinforcement Learning > Methods > Value Iteration Robotics > Applications > Robotics

Keywords

reinforcement learning policy optimization policy learning value function dynamic programming random sampling trajectory optimization state space exploration continuous control robotics control approximate dynamic programming state sampling

Download PDF

Related papers

Exponential Family Predictive Representations of State 2007

Privacy-Preserving Belief Propagation and Sampling 2007

Efficient Principled Learning of Thin Junction Trees 2007

How SVMs can estimate quantiles and the median 2007

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing 2007