TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

Sunwoo Lee; Daseong Jang; Dhammiko Arya; Gyoung-eun Han; Injee Song; SaeRom Kim; Sangjin Kim; Seojin Lee; Seokyoung Hong; Sereimony Sek; Seung-Mo Cho; Sohee Park; Sungbin Yoon; Wonbeom Jang; Eric Davis

2025 EMNLP EMNLP 2025

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

Abstract

AbstractAs Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sunwoo Lee , Daseong Jang , Dhammiko Arya , Gyoung-eun Han , Injee Song , SaeRom Kim , Sangjin Kim , Seojin Lee , Seokyoung Hong , Sereimony Sek , Seung-Mo Cho , Sohee Park , Sungbin Yoon , Wonbeom Jang , Eric Davis

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Planning Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Machine Learning > Learning Types > Retrieval-Augmented Generation Natural Language Processing > Resources & Methods > Retrieval-Augmented Generation Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

benchmark evaluation tool use retrieval-augmented generation reasoning capability agent system llm agent large language model domain-specific benchmark

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025