GUI Agents: A Survey

Dang Nguyen; Jian Chen; Yu Wang; Gang Wu; Namyong Park; Zhengmian Hu; Hanjia Lyu; Junda Wu; Ryan Aponte; Yu Xia; Xintong Li; Jing Shi; Hongjie Chen; Viet Dac Lai; Zhouhang Xie; Sungchul Kim; Ruiyi Zhang; Tong Yu; Mehrab Tanjim; Nesreen K. Ahmed; Puneet Mathur; Seunghyun Yoon; Lina Yao; Branislav Kveton; Jihyung Kil; Thien Huu Nguyen; Trung Bui; Tianyi Zhou; Ryan A. Rossi; Franck Dernoncourt

2025 ACL ACL 2025

GUI Agents: A Survey

Abstract

AbstractGraphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.

👥 Mega-Team — 30 authors

🧭 Keyword Pioneer — digital automation

🐝 Cross-Pollinator — Artificial Intelligence, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Dang Nguyen , Jian Chen , Yu Wang , Gang Wu , Namyong Park , Zhengmian Hu , Hanjia Lyu , Junda Wu , Ryan Aponte , Yu Xia , Xintong Li , Jing Shi , Hongjie Chen , Viet Dac Lai , Zhouhang Xie , Sungchul Kim , Ruiyi Zhang , Tong Yu , Mehrab Tanjim , Nesreen K. Ahmed , Puneet Mathur , Seunghyun Yoon , Lina Yao , Branislav Kveton , Jihyung Kil , Thien Huu Nguyen , Trung Bui , Tianyi Zhou , Ryan A. Rossi , Franck Dernoncourt

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Human-AI Interaction Artificial Intelligence > Core AI > Large Language Models

Keywords

human-computer interaction autonomous agent agent system graphical user interface gui agent digital automation agent perception large language model

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

GUI Agents: A Survey

Abstract

Authors

Topics

Keywords

Related papers