Can Go AIs Be Adversarially Robust?

Tom Tseng; Euan McLean; Kellin Pelrine; Tony Tong Wang; Adam Gleave

2025 AAAI AAAI 2025

Can Go AIs Be Adversarially Robust?

Abstract

Abstract Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even for superhuman systems in narrow domains like Go.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — agent defense

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tom Tseng , Euan McLean , Kellin Pelrine , Tony Tong Wang , Adam Gleave

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Game AI Machine Learning > Learning Types > Adversarial Learning Artificial Intelligence > Core AI > Adversarial Learning

Keywords

adversarial robustness game artificial intelligence adversarial training worst-case performance adversarial attack robustness evaluation neural network agent defense

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025