From Sentences to Proof Trees: Leveraging Language Models for Structured Reasoning

Aayushee Gupta

2026 EACL EACL 2026

From Sentences to Proof Trees: Leveraging Language Models for Structured Reasoning

Abstract

AbstractThe ability of AI systems to not only answer complex natural language questions, but also transparently justify their reasoning, is crucial for building trust and enabling effective human-AI collaboration. In domains requiring multi-hop reasoning, answers must often be constructed by combining multiple relevant sentences from a knowledge base to build an inferential path from the question toward the answer. We tackle this challenge by exploring a neuro-symbolic approach to reasoning through the generation of entailment trees – structured, step-by-step proof trees – using Large Language Models (LLMs). These trees provide interpretable justifications for the inference process. Using the EntailmentBank (CITATION) data set, we evaluated a diverse set of prompting strategies across multiple models, along with a proposal of an inference-guided prompting approach that performs well. We also fine-tuned LLMs trained specifically for proof generation by applying several data augmentation, curriculum learning, and reinforcement-guided optimization strategies. Our results show that the fine-tuned model outperforms all prompting strategies, achieving superior performance across multiple structural and semantic metrics. We also provide a detailed evaluation of which training strategies are helpful towards proof generation. Our findings highlight the importance of proof tree generation as a benchmark for evaluating structured reasoning in LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — proof tree generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aayushee Gupta

Topics

Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Generation > Language Modeling

Keywords

multi-hop reasoning neuro-symbolic approach structured reasoning entailment tree large language model proof tree generation

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026