Understanding PII Leakage in Large Language Models: A Systematic Survey

Shuai Cheng; Zhao Li; Shu Meng; Mengxia Ren; Haitao Xu; Shuai Hao; Chuan Yue; Fan Zhang

2025 IJCAI IJCAI 2025

Understanding PII Leakage in Large Language Models: A Systematic Survey

Abstract

Large Language Models (LLMs) have demonstrated exceptional success across a variety of tasks, particularly in natural language processing, leading to their growing integration into numerous facets of daily life. However, this widespread deployment has raised substantial privacy concerns, especially regarding personally identifiable information (PII), which can be directly associated with specific individuals. The leakage of such information presents significant real-world privacy threats. In this paper, we conduct a systematic investigation into existing research on PII leakage in LLMs, encompassing commonly utilized PII datasets, evaluation metrics, and current studies on both PII leakage attacks and defensive strategies. Finally, we identify unresolved challenges in the current research landscape and suggest future research directions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — pii leakage

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Shuai Cheng , Zhao Li , Shu Meng , Mengxia Ren , Haitao Xu , Shuai Hao , Chuan Yue , Fan Zhang

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Privacy Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Privacy Artificial Intelligence > Core AI > Large Language Models

Keywords

privacy attack privacy preservation information leakage personally identifiable information pii leakage defensive strategy large language model data security

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025