AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments

Zhiheng Xi; Yiwen Ding; Wenxiang Chen; Boyang Hong; Honglin Guo; Junzhe Wang; Xin Guo; Dingwen Yang; Chenyang Liao; Wei He; Songyang Gao; Lu Chen; Rui Zheng; Yicheng Zou; Tao Gui; Qi Zhang; Xipeng Qiu; Xuanjing Huang; Zuxuan Wu; Yu-Gang Jiang

2025 ACL ACL 2025

AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments

Abstract

AbstractLarge language models (LLMs) have emerged as a promising foundation to build generally-capable agents (LLM-based agents) that can handle multi-turn decision-making tasks across various environments. However, the community lacks a unified interactive framework that covers diverse environments for comprehensive evaluation of agents, and enables exploration and learning for their self-improvement. To address this, we propose AgentGym, a framework featuring 7 real-world scenarios, 14 environments, and 89 tasks for unified, real-time, and concurrent agent interaction. We construct expanded instruction set, high-quality trajectories, and comprehensive benchmarking suite for developing LLM-based agents. Moreover, AgentGym supports interactive exploration and learning for agents through multi-turn interactions and real-time feedback. Based on AgentGym, we take the initial step to develop LLM-based agents that can handle diverse tasks via methods like self-improvement or reinforcement learning. Experimental results show that the trained agents can achieve results comparable to commercial models. We hope our work can help the community develop more advanced LLM-based agents. We release the code, dataset, benchmark, and checkpoints at https://agentgym.github.io/.

👥 Mega-Team — 20 authors

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Reinforcement Learning

🧭 Keyword Pioneer — multi-turn decision making

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhiheng Xi , Yiwen Ding , Wenxiang Chen , Boyang Hong , Honglin Guo , Junzhe Wang , Xin Guo , Dingwen Yang , Chenyang Liao , Wei He , Songyang Gao , Lu Chen , Rui Zheng , Yicheng Zou , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang , Zuxuan Wu , Yu-Gang Jiang

Topics

Artificial Intelligence > Core AI > Agent Systems Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning multi-turn interaction agent evaluation agent system decision-making agent large language model multi-turn decision making

Download PDF

Graphically Speaking: Unmasking Abuse in Social Media with Conversation Insights 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision 2025

Structural Deep Encoding for Table Question Answering 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating 2025

AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments

Abstract

Authors

Topics

Keywords

Related papers