Pushing the Frontiers of Scientific Fact-Checking: The SCINLP Dataset

Iffat Maab; Junichi Yamagishi

2026 EACL EACL 2026

Pushing the Frontiers of Scientific Fact-Checking: The SCINLP Dataset

Abstract

AbstractLarge Language Models (LLMs) are increasingly being used to understand how scientific research evolves, drawing growing interest from the research community. However, limited work has explored the scientific fact-checking of research questions and claims from manuscripts, particularly within the NLP domain, an emerging direction for advancing scientific integrity and knowledge validation. In this work, we propose a novel scientific fact-checking dataset, SCINLP, tailored to the NLP domain. Our proposed framework on SCINLP systematically verifies the veracity of complex scientific research questions across varying rationale contexts, while also assessing their temporal positioning. SCINLP includes supporting and refuting research questions from a curated collection of influential and reputable NLP papers published between 2000 and 2024. In our framework, we use multiple LLMs and diverse rationale contexts from our dataset to examine scientific claims and research focus, complemented by feasibility judgments for deeper insight into scientific reasoning in NLP.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Iffat Maab , Junichi Yamagishi

Topics

Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Knowledge Editing

Keywords

temporal reasoning question answering language model fact checking knowledge verification scientific literature

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026