2026 AAAI AAAI 2026

From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Abstract

Abstract Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across tasks, applications, and modalities. Yet, evidence of their use in enterprise settings remains limited. This paper reports IBM’s experience developing and piloting the Computer Using Generalist Agent (CUGA). CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was evaluated in a Business-Process-Outsourcing talent acquisition pilot, addressing enterprise requirements for scalability, auditability, safety, and governance. In preliminary evaluations, CUGA approached the accuracy of specialized agents while suggesting reductions in development time and cost. We provide early evidence that generalist agents can operate at enterprise scale, distill key technical and organizational lessons, and outline requirements for transitioning research-grade architectures like CUGA into enterprise-ready systems.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio