Policy Search using Paired Comparisons

Malcolm J. A. Strens; Andrew W. Moore

2002 JMLR JMLR 2002

Policy Search using Paired Comparisons

Abstract

Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the 'overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution. [abs] [pdf] [ps.gz] [ps]

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Policy Learning

🧭 Keyword Pioneer — policy search

🐣 Hot Topic Early Bird — stochastic optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Malcolm J. A. Strens , Andrew W. Moore

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Policy Learning

Keywords

stochastic optimization policy search paired comparison

Download PDF

Related papers

Kernel Independent Component Analysis 2002

Memory-Based Shallow Parsing 2002

Covering Number Bounds of Certain Regularized Linear Function Classes 2002

On the Convergence of Optimistic Policy Iteration 2002

The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces 2002