Evaluating Humanities Theory Alignment in Large Language Models: Incremental Prompting and Statistical Assessment

Axel Pichler; Janis Pagel

2026 EACL EACL 2026

Evaluating Humanities Theory Alignment in Large Language Models: Incremental Prompting and Statistical Assessment

Abstract

AbstractWe propose a method to evaluate the extent to which an LLM’s observable input–output behavior aligns with established theories in the humanities and cultural studies. We instantiate the framework on three humanities theories—Davidson’s truth-conditional semantics, Lewis’s truth in fiction, and Iser’s concept of textual gaps—using a top-down, theory-driven black-box framework. Core assumptions of these theories are reconstructed into testable behavioral rules and assessed via controlled classification tasks with systematic prompt comparisons and significance testing. Our experiments show that theory-uninformed classification prompts generally outperform theory-enriched prompts in Lewis and Iser settings, while theory-informed prompts help in the Davidson task. Gemini Flash consistently achieves the highest scores across tasks and corpora, while the Iser gap detection task remains substantially harder than binary truth-conditional judgments. Statistical tests confirm robust prompt effects and the failure of basic prompts. However, model behavior under incremental theory exposure is unstable and architecture-dependent.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — theory alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Axel Pichler , Janis Pagel

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Resources & Methods > Large Language Models

Keywords

prompt engineering classification task behavioral analysis theory alignment statistical assessment

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026