Evaluating model performance under worst-case subpopulations

Mike Li; Hongseok Namkoong; Shangzhou Xia

2021 NIPS NeurIPS 2021

Evaluating model performance under worst-case subpopulations

Abstract

The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes $Z$. This notion of robustness can consider arbitrary (continuous) attributes $Z$, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of $Z$ only through the out-of-sample error in estimating the performance conditional on $Z$. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.

🧭 Keyword Pioneer — subpopulation shift

🐣 Hot Topic Early Bird — convergence guarantee

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mike Li , Hongseok Namkoong , Shangzhou Xia

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Application Areas > Fairness Machine Learning > Learning Types > Robustness Machine Learning > Learning Types > Distribution Shift Machine Learning > Optimization & Theory > Robustness

Keywords

subpopulation shift worst-case performance distributional robustness distributionally robust optimization convergence guarantee group fairness

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021