ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

Haoxin Wang; Xianhan Peng; Huang Cheng; Yizhe Huang; Ming Gong; Chenghan Yang; Yang Liu; Jiang Lin

2025 EMNLP EMNLP 2025

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?

Abstract

AbstractIn this paper, we introduce , the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making highly challenging. For instance, even advanced models like GPT-4o achieve only a 10–20% pass3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoxin Wang , Xianhan Peng , Huang Cheng , Yizhe Huang , Ming Gong , Chenghan Yang , Yang Liu , Jiang Lin

Topics

Artificial Intelligence > Core AI > Agent Systems Natural Language Processing > Applications > Question Answering Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Dialogue Systems Deep Learning > Models > Large Language Models Deep Learning > Learning Types > Multi-Modal Learning

Keywords

benchmark evaluation multimodal learning dialogue system customer service llm agent customer support large language model

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025