Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Atticus Geiger; Duligur Ibeling; Amir Zur; Maheep Chaudhary; Sonakshi Chauhan; Jing Huang; Aryaman Arora; Zhengxuan Wu; Noah Goodman; Christopher Potts; Thomas Icard

2025 JMLR JMLR 2025

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Abstract

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard; 26(83):1−64, 2025.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Atticus Geiger , Duligur Ibeling , Amir Zur , Maheep Chaudhary , Sonakshi Chauhan , Jing Huang , Aryaman Arora , Zhengxuan Wu , Noah Goodman , Christopher Potts , Thomas Icard

Topics

Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Core AI > Interpretability

Keywords

causal inference mechanistic interpretability causal abstraction

Download PDF

Related papers

On the Natural Gradient of the Evidence Lower Bound 2025

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method 2025

Extending Temperature Scaling with Homogenizing Maps 2025

Ontolearn---A Framework for Large-scale OWL Class Expression Learning in Python 2025

An Axiomatic Definition of Hierarchical Clustering 2025