Editing a classifier by rewriting its prediction rules

Shibani Santurkar; Dimitris Tsipras; Mahalaxmi Elango; David Bau; Antonio Torralba; Aleksander Madry

2021 NIPS NeurIPS 2021

Editing a classifier by rewriting its prediction rules

Abstract

We propose a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our method requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — classifier editing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shibani Santurkar , Dimitris Tsipras , Mahalaxmi Elango , David Bau , Antonio Torralba , Aleksander Madry

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Domain Adaptation

Keywords

domain adaptation spurious feature prediction rule classifier editing

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021