Large Language Models Can Be Contextual Privacy Protection Learners

Yijia Xiao; Yiqiao Jin; Yushi Bai; Yue Wu; Xianjun Yang; Xiao Luo; Wenchao Yu; Xujiang Zhao; Yanchi Liu; Quanquan Gu; Haifeng Chen; Wei Wang; Wei Cheng

2024 EMNLP EMNLP 2024

Large Language Models Can Be Contextual Privacy Protection Learners

Abstract

AbstractThe proliferation of Large Language Models (LLMs) has driven considerable interest in fine-tuning them with domain-specific data to create specialized language models. Nevertheless, such domain-specific fine-tuning data often contains contextually sensitive personally identifiable information (PII). Direct fine-tuning LLMs on this data without privacy protection poses a risk of data leakage of sensitive PII during inference time. To address this challenge, we introduce Contextual Privacy Protection Language Models (CPPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding inference-time data privacy. Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning, etc. Extensive experiments across diverse datasets and scenarios demonstrate the effectiveness of our approaches. In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model’s knowledge. Our work underscores the potential for Large Language Models as robust contextual privacy protection learners.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — contextual privacy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yijia Xiao , Yiqiao Jin , Yushi Bai , Yue Wu , Xianjun Yang , Xiao Luo , Wenchao Yu , Xujiang Zhao , Yanchi Liu , Quanquan Gu , Haifeng Chen , Wei Wang , Wei Cheng

Topics

Machine Learning > Application Areas > Privacy Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Privacy Machine Learning > Learning Types > Privacy Deep Learning > Learning Types > Fine-Tuning

Keywords

instruction tuning privacy protection domain-specific knowledge personally identifiable information large language model contextual privacy

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024