Understanding and Mitigating Data Contamination in Deep Anomaly Detection: A Kernel-based Approach

Shuang Wu; Jingyu Zhao; Guangjian Tian

2022 IJCAI IJCAI 2022

Understanding and Mitigating Data Contamination in Deep Anomaly Detection: A Kernel-based Approach

Abstract

Deep anomaly detection has become popular for its capability of handling complex data. However, training a deep detector is fragile to data contamination due to overfitting. In this work, we study the performance of the anomaly detectors under data contamination and construct a data-efficient countermeasure against data contamination. We show that training a deep anomaly detector induces an implicit kernel machine. We then derive an information-theoretic bound of performance degradation with respect to the data contamination ratio. To mitigate the degradation, we propose a contradicting training approach. Apart from learning normality on the contaminated dataset, our approach discourages learning an additional small auxiliary dataset of labeled anomalies. Our approach is much more affordable than constructing a completely clean training dataset. Experiments on public datasets show that our approach significantly improves anomaly detection in the presence of contamination and outperforms some recently proposed detectors.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐣 Hot Topic Early Bird — data contamination

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuang Wu , Jingyu Zhao , Guangjian Tian

Topics

Machine Learning > Optimization & Theory > Learning Theory Computer Vision > Analysis > Anomaly Detection

Keywords

anomaly detection data contamination information-theoretic bound kernel machine

Download PDF

Related papers

Better Collective Decisions via Uncertainty Reduction 2022

Mixed Strategies for Security Games with General Defending Requirements 2022

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations 2022

Distortion in Voting with Top-t Preferences 2022

Let’s Agree to Agree: Targeting Consensus for Incomplete Preferences through Majority Dynamics 2022