From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Maximilian Dreyer; Frederik Pahde; Christopher J. Anders; Wojciech Samek; Sebastian Lapuschkin

2024 AAAI AAAI 2024

From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Abstract

Abstract Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures. Code and Appendix are available on https://github.com/frederikpahde/rrclarc.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Healthcare & Medicine and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maximilian Dreyer , Frederik Pahde , Christopher J. Anders , Wojciech Samek , Sebastian Lapuschkin

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Fairness Deep Learning > Techniques > Model Architecture Artificial Intelligence > Core AI > Fairness Healthcare & Medicine > Clinical > Medical AI Deep Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Fairness

Keywords

adversarial learning feature extraction medical imaging concept learning bias mitigation latent space spurious correlation gradient penalization concept activation vector latent feature space model correction

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024