Improving Subgroup Robustness via Data Selection

Saachi Jain; Kimia Hamidieh; Kristian Georgiev; Andrew Ilyas; Marzyeh Ghassemi; Aleksander Madry

2024 NIPS NeurIPS 2024

Improving Subgroup Robustness via Data Selection

Abstract

Machine learning models can often fail on subgroups that are underrepresentedduring training. While dataset balancing can improve performance onunderperforming groups, it requires access to training group annotations and canend up removing large portions of the dataset. In this paper, we introduceData Debiasing with Datamodels (D3M), a debiasing approachwhich isolates and removes specific training examples that drive the model'sfailures on minority groups. Our approach enables us to efficiently traindebiased classifiers while removing only a small number of examples, and doesnot require training group annotations or additional hyperparameter tuning.

🧭 Keyword Pioneer — data debiasing

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Saachi Jain , Kimia Hamidieh , Kristian Georgiev , Andrew Ilyas , Marzyeh Ghassemi , Aleksander Madry

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Application Areas > Fairness Machine Learning > Learning Types > Representation Learning Machine Learning > Core Methods > Optimization Machine Learning > Learning Types > Fairness Deep Learning > Learning Types > Representation Learning Machine Learning > Learning Types > Robustness

Keywords

machine learning fairness data selection subgroup robustness data debiasing minority group training example model failure model debiasing model fairness group annotation

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024