Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications

Divyansh Agarwal; Alexander Fabbri; Ben Risher; Philippe Laban; Shafiq Joty; Chien-Sheng Wu

2024 EMNLP EMNLP 2024

Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications

Abstract

AbstractPrompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts. We present different combination of defenses against our threat model, including a cost analysis. Our study highlights key takeaways for building secure LLM applications and provides directions for research in multi-turn LLM interactions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Security & Privacy

🧭 Keyword Pioneer — defensive strategy

🐣 Hot Topic Early Bird — multi-turn dialogue

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Divyansh Agarwal , Alexander Fabbri , Ben Risher , Philippe Laban , Shafiq Joty , Chien-Sheng Wu

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Privacy Security & Privacy > Privacy Artificial Intelligence > Core AI > Privacy Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models

Keywords

adversarial learning multi-turn interaction intellectual property protection multi-turn dialogue defensive strategy privacy threat llm security large language model prompt leakage system prompt

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024