Learning Invariant Representations with Missing Data

Mark Goldstein; Joern-Henrik Jacobsen; Olina Chau; Adriel Saporta; Aahlad Manas Puli; Rajesh Ranganath; Andrew Miller

2022 CLEAR CLeaR 2022

Learning Invariant Representations with Missing Data

Abstract

Spurious correlations, or *shortcuts*, allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving the correlation-inducing *nuisance* variable have guarantees on their test performance. However, enforcing such independencies requires nuisances to be observed during training. But nuisances such as demographics or image background labels are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. In this work, we derive the missing-mmd estimator used for invariance objectives under missing nuisances. On simulations and clinical data, missing-mmds enable improvements in test performance similar to those achieved by using fully-observed data.

🚀 Conference Pioneer — CLEAR 2022

🧭 Keyword Pioneer — nuisance variable

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Goldstein , Joern-Henrik Jacobsen , Olina Chau , Adriel Saporta , Aahlad Manas Puli , Rajesh Ranganath , Andrew Miller

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Application Areas > Domain Generalization

Keywords

domain generalization spurious correlation invariant representation missing datum test performance nuisance variable

Download PDF

Related papers

Towards efficient representation identification in supervised learning 2022

Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA 2022

Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data 2022

Can Humans Be out of the Loop? 2022

Differentiable Causal Discovery Under Latent Interventions 2022