Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Boyang Zhang; Istemi Ekin Akkus; Ruichuan Chen; Alice Dethise; Klaus Satzke; Ivica Rimac; Yang Zhang

2026 EACL EACL 2026

Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models

Abstract

AbstractVision Language Models (VLMs) have demonstrated remarkable capabilities in processing multimodal data, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally Identifiable Information (PII) leakage. While relevant research has been conducted on single-modal language models to some extent, the vulnerabilities in the multimodal setting have yet to be fully investigated. Our work assesses these emerging risks and introduces a concept-guided mitigation approach. By identifying and modifying the model’s internal states associated with PII-related content, our method guides VLMs to refuse PII-sensitive tasks effectively and efficiently, without requiring re-training or fine-tuning. We also address the current lack of multimodal PII datasets by constructing various ones that simulate real-world scenarios. Experimental results demonstrate the method can achieve on average 93.3% refusal rate for various PII-related tasks with minimal impact on unrelated model performances. We further examine the mitigation’s performance under various conditions to show the adaptability of our proposed method.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Boyang Zhang , Istemi Ekin Akkus , Ruichuan Chen , Alice Dethise , Klaus Satzke , Ivica Rimac , Yang Zhang

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Privacy

Keywords

vision language model privacy leakage personally identifiable information internal state refusal mechanism

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026