Prototype Guided Backdoor Defense via Activation Space Manipulation

Venkat Adithya Amula; Sunayana Samavedam; Saurabh Saini; Avani Gupta; P J Narayanan

2025 ICCV ICCV 2025

Prototype Guided Backdoor Defense via Activation Space Manipulation

Abstract

Deep learning models are susceptible to backdoor attacks involving malicious perturbation of some training data with a trigger to force misclassification to a target class. Various triggers have been used including semantic triggers that are easily realizable. We present Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PPGBD exploits displacements in the geometric spaces of activations to penalize movements towards the trigger. This is done using a novel sanitization loss of a post-hoc fine-tuning step. This approach scales to all types of attacks and triggers, and achieves better performance across settings. We also present the first defense against semantic attacks on a new celebrity face images dataset. Activation spaces can provide rich clues to enhance DL models in different ways.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — semantic trigger

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Venkat Adithya Amula , Sunayana Samavedam , Saurabh Saini , Avani Gupta , P J Narayanan

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Learning Types > Adversarial Learning Deep Learning > Techniques > Model Architecture

Keywords

adversarial robustness backdoor defense deep learning adversarial perturbation activation space semantic trigger prototype guidance model sanitization

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025