Composing Efficient, Robust Tests for Policy Selection

Dustin Morrill; Thomas J. Walsh; Daniel Hernández; Peter R. Wurman; Peter Stone

2023 UAI UAI 2023

Composing Efficient, Robust Tests for Policy Selection

Abstract

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dustin Morrill , Thomas J. Walsh , Daniel Hernández , Peter R. Wurman , Peter Stone

Topics

Artificial Intelligence > Core AI > Game AI Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Policy Learning

Keywords

reinforcement learning policy selection two-player game test case selection robust testing

Download PDF

Related papers

Memory Mechanism for Unsupervised Anomaly Detection 2023

Semi-supervised learning of partial differential operators and dynamical flows 2023

Inference for mark-censored temporal point processes 2023

Increasing effect sizes of pairwise conditional independence tests between random vectors 2023

Variable importance matching for causal inference 2023