The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Ziqian Zhong; Ziming Liu; Max Tegmark; Jacob Andreas

2023 NIPS NeurIPS 2023

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Abstract

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms? Several recent studies, on tasks ranging from group operations to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex: small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even learning of multiple different solutions in parallel. In modular addition, we specifically show that models learn a known Clock algorithm, a previously undescribed, less intuitive, but comprehensible procedure we term the Pizza algorithm, and a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for mechanistically characterizing the behavior of neural networks across the algorithmic phase space.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — mechanistic explanation

🐣 Hot Topic Early Bird — mechanistic interpretability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ziqian Zhong , Ziming Liu , Max Tegmark , Jacob Andreas

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Theory Deep Learning > Architectures > Neural Networks Machine Learning > Learning Types > Representation Learning Deep Learning > Optimization & Theory > Theory

Keywords

in-context learning mechanistic interpretability algorithm discovery neural network mechanistic explanation modular addition

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023