Token-Context Attention for NLI: An Alternative to Self-Attention

Xin Zhang; Victor S. Sheng

2026 AAAI AAAI 2026

Token-Context Attention for NLI: An Alternative to Self-Attention

Abstract

Abstract Despite the rapid progress in large language models (LLMs), even sub-billion-scale systems perform at chance level on challenging natural language inference (NLI) benchmarks such as Adversarial Natural Language Inference (ANLI), while training larger models is often impractical due to limited computational resources. We address this parameter-efficiency bottleneck in NLI with a Complex-Vector Token Representation that explicitly decouples each token from its context, and a Token-Context Attention mechanism that updates each token based on the most informative contextual semantics. On ANLI, a 0.8B-parameter Token-Context Attention model achieves higher parameter efficiency (accuracy per parameter) than all 1B and comparable 0.8B self-attention baselines; it also suffers smaller performance degradation under Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks and achieves the largest few-shot gains on SNLI and MNLI while exhibiting no significant degradation in ANLI accuracy after adaptation. These results suggest that explicitly disentangling token and context offers a viable alternative to standard self-attention for NLI tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xin Zhang , Victor S. Sheng

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Understanding > Semantic Analysis

Keywords

adversarial robustness few-shot learning natural language inference parameter efficiency token representation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026