Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI

Octavian Alexandru Trifan; Jason Lee Weber; Marc Titus Trifan; Alexandru Nicolau; Alexander Veidenbaum

2025 EMNLP EMNLP 2025

Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI

Abstract

AbstractEdge deployment of task-oriented semantic parsers demands high accuracy under tight latency and memory budgets. We present Grammar Pruning, a lightweight zero-shot framework that begins with a user-defined schema of API calls and couples a rule-based entity extractor with an iterative grammar-constrained decoder: extracted items dynamically prune the context-free grammar, limiting generation to only those intents, slots, and values that remain plausible at each step. This aggressive search-space reduction both reduces hallucinations and slashes decoding time. On the adapted FoodOrdering, APIMixSNIPS, and APIMixATIS benchmarks, Grammar Pruning with small language models achieves an average execution accuracy of over 90%—rivaling State-of-the-Art, cloud-based solutions—while sustaining at least 2x lower end-to-end latency than existing methods. By requiring nothing beyond the domain’s full API schema values yet delivering precise, real-time natural-language understanding, Grammar Pruning positions itself as a practical building block for future edge-AI applications that cannot rely on large models or cloud offloading.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — grammar pruning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Octavian Alexandru Trifan , Jason Lee Weber , Marc Titus Trifan , Alexandru Nicolau , Alexander Veidenbaum

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Learning Types > Zero-Shot Learning Machine Learning > Application Areas > Efficient Computing Artificial Intelligence > Core AI > Large Language Models

Keywords

zero-shot learning semantic parsing task-oriented dialogue edge computing edge deployment low latency grammar pruning task oriented grammar-based decoding

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025