Training in Step-by-Step Formal Reasoning Improves Pronominal Reasoning in Language Models

Vagrant Gautam

2026 EACL EACL 2026

Training in Step-by-Step Formal Reasoning Improves Pronominal Reasoning in Language Models

Abstract

AbstractLarge reasoning models are trained to solve problems by decomposing them into steps.While they show impressive progress on reasoning tasks, "reasoning" here is typically limited to formal reasoning, i.e., math, code, and logic.An open question is whether these abilities transfer to _pronominal reasoning_, where step-by-step thinking in non-reasoning models worsens performance, but code pre-training may help.I answer this question by evaluating six pairs of original and DeepSeek-distilled models (1.5B-70B parameters) on six challenging datasets for English pronoun resolution (identifying whom a pronoun refers to) and pronoun fidelity (learning and applying a pronoun mapping correctly).Performance improves statistically significantly on all datasets (31% relative increase), indicating that distilling step-by-step formal reasoning does in fact help with pronominal reasoning, in part by improving instruction-following.With a qualitative evaluation of 720 generations, I show that improvements occur across granular error types, and come from plausible-looking reasoning chains employing a variety of reasoning strategies.However, the gains put models just above random performance on these datasets, leaving plenty of room for improvement.

🧭 Keyword Pioneer — pronominal reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Vagrant Gautam

Topics

Natural Language Processing > Understanding > Coreference Resolution Natural Language Processing > Resources & Methods > Large Language Models

Keywords

model distillation coreference resolution formal reasoning pronominal reasoning

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026