BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance

Ruyi Zhang; Songlei Jian; Yusong Tan; Heng Gao; Haifang Zhou; Kai Lu

2025 ACL ACL 2025

BadWindtunnel: Defending Backdoor in High-noise Simulated Training with Confidence Variance

Abstract

AbstractCurrent backdoor attack defenders in Natural Language Processing (NLP) typically involve data reduction or model pruning, risking losing crucial information. To address this challenge, we introduce a novel backdoor defender, i.e., BadWindtunnel, in which we build a high-noise simulated training environment, similar to the wind tunnel, which allows precise control over training conditions to model the backdoor learning behavior without affecting the final model. We also use the confidence variance as a learning behavior quantification metric in the simulated training, which is based on the characteristics of backdoor-poisoned data (shorted in poisoned data): higher learnability and robustness. In addition, we propose a two-step strategy to further model poisoned data, including target label identification and poisoned data revealing. Extensive experiments demonstrate BadWindtunnel’s superiority, with a 21% higher average reduction in attack success rate than the second-best defender.

🧭 Keyword Pioneer — confidence variance

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio