Thunder-NUBench: A Benchmark for LLMs’ Sentence-Level Negation Understanding

Yeonkyoung So; Gyuseong Lee; Sungmok Jung; Joonhak Lee; Jia Kang; Sangho Kim; Jaejin Lee

2026 EACL EACL 2026

Thunder-NUBench: A Benchmark for LLMs’ Sentence-Level Negation Understanding

Abstract

AbstractNegation is a fundamental linguistic phenomenon that poses ongoing challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Current benchmarks often treat negation as a minor detail within broader tasks, such as natural language inference. Consequently, there is a lack of benchmarks specifically designed to evaluate comprehension of negation. In this work, we introduce *Thunder-NUBench* — a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. Thunder-NUBench goes beyond identifying surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually created sentence-negation pairs and a multiple-choice dataset, allowing for a comprehensive evaluation of models’ understanding of negation.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yeonkyoung So , Gyuseong Lee , Sungmok Jung , Joonhak Lee , Jia Kang , Sangho Kim , Jaejin Lee

Topics

Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Resources & Methods > Natural Language Inference

Keywords

natural language inference semantic analysis negation understanding large language model sentence understanding

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026