← Applications

Natural Language Processing › Applications ›

Natural Language Inference

918 directly classified papers

Papers per year

Papers

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models ACL 2025

Natural Logic at the Core: Dynamic Rewards for Entailment Tree Generation ACL 2025

ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models ACL 2025

Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs ACL 2025

On Reference (In-)Determinacy in Natural Language Inference NAACL 2025

Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants ACL 2025

What does memory retrieval leave on the table? Modelling the Cost of Semi-Compositionality with MINERVA2 and sBERT ACL 2025

GG-BBQ: German Gender Bias Benchmark for Question Answering ACL 2025

JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models ACL 2025

Big Escape Benchmark: Evaluating Human-Like Reasoning in Language Models via Real-World Escape Room Challenges ACL 2025

Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems COLING 2025

CoMeDi Shared Task: Median Judgment Classification & Mean Disagreement Ranking with Ordinal Word-in-Context Judgments COLING 2025

Deep-change at CoMeDi: the Cross-Entropy Loss is not All You Need COLING 2025

Predicting Median, Disagreement and Noise Label in Ordinal Word-in-Context Data COLING 2025

Funzac at CoMeDi Shared Task: Modeling Annotator Disagreement from Word-In-Context Perspectives COLING 2025

MMLabUIT at CoMeDiShared Task: Text Embedding Techniques versus Generation-Based NLI for Median Judgment Classification COLING 2025

Exploiting Task Reversibility of DRS Parsing and Generation: Challenges and Insights from a Multi-lingual Perspective COLING 2025

Detecting Inconsistencies in Narrative Elements of Cross Lingual Nakba Texts COLING 2025

Linking language model predictions to human behaviour on scalar implicatures COLING 2025

Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning COLING 2025

Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study COLING 2025

ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models COLING 2025

CARE: A Disagreement Detection Framework with Concept Alignment and Reasoning Enhancement EMNLP 2025

Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient’s Point of View NAACL 2025

Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass EMNLP 2025