2026
AAAI
AAAI 2026
Measuring the Unmeasurable: Unveiling Latent Cognitive Capabilities of LLM
Abstract
Abstract As large language models (LLMs) are increasingly deployed in high-stakes domains such as education, healthcare, and law, accurately evaluating their nuanced reasoning process becomes essential to ensure their safety, reliability, and trustworthiness. However, most existing benchmarks evaluate LLMs at a coarse granularity. Current benchmarks lack a unified framework and rely on single‐task datasets, overlooking the intermediate steps of complex reasoning. This results in redundant overlap across benchmarks, poor generalization to multifaceted real-world tasks, and underutilizes the rich reasoning traces generated by advanced LLMs.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Natural Language Processing
🧭
Keyword Pioneer
— cognitive capabilities
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Cui Danxin
,
Sihang Jiang
,
Keyi Wang
,
Zhiyi Duan
,
Yanghua Xiao
,
Bi Yude
,
Jiaqing Liang
,
Minggui He
,
Shimin Tao
,
Yilun Liu