Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Nathan Grinsztajn; Daniel Furelos-Blanco; Shikha Surana; Clément Bonnet; Tom Barrett

2023 NIPS NeurIPS 2023

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Abstract

Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam-search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of complementary policies, and obtains state-of-the-art RL results on three popular NP-hard problems: traveling salesman, capacitated vehicle routing, and job-shop scheduling.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning

🧭 Keyword Pioneer — policy population

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nathan Grinsztajn , Daniel Furelos-Blanco , Shikha Surana , Clément Bonnet , Tom Barrett

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Reinforcement Learning > Applications > Game AI Mathematics & Optimization > Optimization > Combinatorial Optimization Machine Learning > Learning Types > Multi-Agent Systems Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

combinatorial optimization reinforcement learning travelling salesman problem vehicle routing vehicle routing problem population-based training population training traveling salesman policy population

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023