DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

Joachim Daiber; Victor Maricato; Ayan Sinha; Andrew Rabinovich

2025 EMNLP EMNLP 2025

DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

Abstract

AbstractWe introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — api calling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Joachim Daiber , Victor Maricato , Ayan Sinha , Andrew Rabinovich

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Machine Learning > Learning Types > Fine-Tuning Machine Learning > Application Areas > Recommender Systems Machine Learning > Application Areas > Information Retrieval Deep Learning > Learning Types > Fine-Tuning

Keywords

information retrieval model fine-tuning query understanding function calling small language model search relevance api call api calling e-commerce search

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025