LLMGuard: Guarding against Unsafe LLM Behavior

Shubh Goyal; Medha Hira; Shubham Mishra; Sukriti Goyal; Arnav Goel; Niharika Dadu; Kirushikesh DB; Sameep Mehta; Nishtha Madaan

2024 AAAI AAAI 2024

LLMGuard: Guarding against Unsafe LLM Behavior

Abstract

Abstract Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

🧭 Keyword Pioneer — ensemble detector

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Shubh Goyal , Medha Hira , Shubham Mishra , Sukriti Goyal , Arnav Goel , Niharika Dadu , Kirushikesh DB , Sameep Mehta , Nishtha Madaan

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Responsible AI

Keywords

content moderation ensemble detector unsafe content detection

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024