Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts

Shayna Gardiner; Tania Habib; Kevin Humphreys; Masha Azizi; Frédéric Mailhot; Anne Paling; Preston Thomas; Nathan Zhang

2024 EACL EACL 2024

Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts

Abstract

AbstractLarge language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing and Security & Privacy

🧭 Keyword Pioneer — call transcript

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shayna Gardiner , Tania Habib , Kevin Humphreys , Masha Azizi , Frédéric Mailhot , Anne Paling , Preston Thomas , Nathan Zhang

Topics

Machine Learning > Application Areas > Privacy Natural Language Processing > Resources & Methods > Large Language Models Security & Privacy > Privacy Artificial Intelligence > Core AI > Privacy Deep Learning > Models > Large Language Models Machine Learning > Learning Types > Fine-Tuning

Keywords

privacy preservation data anonymization call transcript large language model

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024