Robust reinforcement learning under minimax regret for green security

Lily Xu; Andrew Perrault; Fei Fang; Haipeng Chen; Milind Tambe

2021 UAI UAI 2021

Robust reinforcement learning under minimax regret for green security

Abstract

Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries’ future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning–based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — sequential patrol planning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Lily Xu , Andrew Perrault , Fei Fang , Haipeng Chen , Milind Tambe

Topics

Artificial Intelligence > Core AI > Game AI Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Policy Learning

Keywords

robust reinforcement learning markov decision process minimax regret green security sequential patrol planning defender strategy

Download PDF

Related papers

Efficient greedy coordinate descent via variable partitioning 2021

Multi-output Gaussian Processes for uncertainty-aware recommender systems 2021

Constrained differentially private federated learning for low-bandwidth devices 2021

Matrix games with bandit feedback 2021

A weaker faithfulness assumption based on triple interactions 2021