← Security & Privacy

Security & Privacy ›

Privacy

626 directly classified papers

Papers per year

Papers

WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks ACL 2025

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining NAACL 2025

SDD: Self-Degraded Defense against Malicious Fine-tuning ACL 2025

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack ACL 2025

RecordTwin: Towards Creating Safe Synthetic Clinical Corpora ACL 2025

Stealthy Backdoor Attack in Federated Learning via Adaptive Layer-wise Gradient Alignment ICCV 2025

Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models COLING 2025

Backdoor Mitigation by Distance-Driven Detoxification ICCV 2025

StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data ICCV 2025

Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety NAACL 2025

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models NAACL 2025

Jailbreaking with Universal Multi-Prompts NAACL 2025

Avoiding Copyright Infringement via Large Language Model Unlearning NAACL 2025

An Optimizable Suffix Is Worth A Thousand Templates: Efficient Black-box Jailbreaking without Affirmative Phrases via LLM as Optimizer NAACL 2025

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models NAACL 2025

Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In NAACL 2025

Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models NAACL 2025

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems NAACL 2025

Augmented Adversarial Trigger Learning NAACL 2025

Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents NAACL 2025

HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech NAACL 2025

TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models NAACL 2025

Named Entity Inference Attacks on Clinical LLMs: Exploring Privacy Risks and the Impact of Mitigation Strategies NAACL 2025

Beyond De-Identification: A Structured Approach for Defining and Detecting Indirect Identifiers in Medical Texts NAACL 2025

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models ACL 2025