Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

Zhenglin Wang; Jialong Wu; Pengfei Li; Yong Jiang; Deyu Zhou

2026 EACL EACL 2026

Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

Abstract

AbstractTemporal reasoning is fundamental to human cognition and is crucial for various real-world applications. While recent advances in Large Language Models have demonstrated promising capabilities in temporal reasoning, existing benchmarks primarily rely on rule-based construction, lack contextual depth, and involve a limited range of temporal entities. To address these limitations, we introduce Chinese Time Reasoning (CTM), a benchmark designed to evaluate LLMs on temporal reasoning within the extensive scope of Chinese dynastic chronology. CTM emphasizes cross-entity relationships, pairwise temporal alignment, and contextualized and culturally-grounded reasoning, providing a comprehensive evaluation. Extensive experimental results reveal the challenges posed by CTM and highlight potential avenues for improvement.

🧭 Keyword Pioneer — chinese history

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhenglin Wang , Jialong Wu , Pengfei Li , Yong Jiang , Deyu Zhou

Topics

Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Large Language Models

Keywords

benchmark evaluation temporal reasoning cultural knowledge chinese history cross-entity relationship

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026