Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Jihwan Bang; Juntae Lee; Kyuhong Shim; Seunghan Yang; Simyung Chang

2024 ACL ACL 2024

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Abstract

AbstractThe customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — adapter blending

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Jihwan Bang , Juntae Lee , Kyuhong Shim , Seunghan Yang , Simyung Chang

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Resources & Methods > Large Language Models Computer Science > Systems > Distributed Systems Machine Learning > Application Areas > Model Compression Artificial Intelligence > Core AI > Large Language Models

Keywords

privacy-preserving machine learning knowledge distillation on-device inference adapter blending edge-server hybrid inference model customization on-device deployment large language model hybrid inference

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024