Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Yifan Hou; Jiaoda Li; Yu Fei; Alessandro Stolfo; Wangchunshu Zhou; Guangtao Zeng; Antoine Bosselut; Mrinmaya Sachan

2023 EMNLP EMNLP 2023

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Abstract

AbstractRecent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model’s attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model’s attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

📈 Trend Setter — Interpretability

🧭 Keyword Pioneer — reasoning tree

🐣 Hot Topic Early Bird — mechanistic interpretability

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

Authors

Yifan Hou , Jiaoda Li , Yu Fei , Alessandro Stolfo , Wangchunshu Zhou , Guangtao Zeng , Antoine Bosselut , Mrinmaya Sachan

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Reasoning Deep Learning > Learning Types > Representation Learning Deep Learning > Optimization & Theory > Interpretability

Keywords

mechanistic interpretability multi-step reasoning attention pattern reasoning tree probing approach

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023