Less is More: Empowering GUI Agent with Context-Aware Simplification

Gongwei Chen; Xurui Zhou; Rui Shao; Yibo Lyu; Kaiwen Zhou; Shuai Wang; Wentao Li; Yinchuan Li; Zhongang Qi; Liqiang Nie

2025 ICCV ICCV 2025

Less is More: Empowering GUI Agent with Context-Aware Simplification

Abstract

The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agents and summarize: **1) the high-density and loose-relation of element context** highlight the existence of many unrelated elements and their negative influence; **2) the high redundancy of history context** reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed **SimpAgent**. To mitigate potential interference from numerous unrelated elements, we introduce a **masking-based element pruning** method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a **consistency-guided history compression** module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — context-aware simplification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gongwei Chen , Xurui Zhou , Rui Shao , Yibo Lyu , Kaiwen Zhou , Shuai Wang , Wentao Li , Yinchuan Li , Zhongang Qi , Liqiang Nie

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Foundation Models Computer Vision > Analysis > Scene Understanding Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Optimization & Theory > Model Compression

Keywords

model compression human-computer interaction context modeling gui agent flops reduction vision-based navigation large language model history compression context-aware simplification element pruning

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025