LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Hayder Elesedy; Pedro M Esperanca; Silviu Vlad Oprea; Mete Ozay

2024 EMNLP EMNLP 2024

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Abstract

AbstractGuardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hayder Elesedy , Pedro M Esperanca , Silviu Vlad Oprea , Mete Ozay

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Efficient Computing Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Merging Computer Science > Applications > Cybersecurity Artificial Intelligence > Core AI > Privacy Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Optimization & Theory > Model Compression

Keywords

knowledge distillation content moderation parameter-efficient learning parameter-efficient fine-tuning low-rank adaptation on-device inference knowledge sharing low-rank adapter large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024