2026 AAAI AAAI 2026

Token-Context Attention for NLI: An Alternative to Self-Attention

Abstract

Abstract Despite the rapid progress in large language models (LLMs), even sub-billion-scale systems perform at chance level on challenging natural language inference (NLI) benchmarks such as Adversarial Natural Language Inference (ANLI), while training larger models is often impractical due to limited computational resources. We address this parameter-efficiency bottleneck in NLI with a Complex-Vector Token Representation that explicitly decouples each token from its context, and a Token-Context Attention mechanism that updates each token based on the most informative contextual semantics. On ANLI, a 0.8B-parameter Token-Context Attention model achieves higher parameter efficiency (accuracy per parameter) than all 1B and comparable 0.8B self-attention baselines; it also suffers smaller performance degradation under Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks and achieves the largest few-shot gains on SNLI and MNLI while exhibiting no significant degradation in ANLI accuracy after adaptation. These results suggest that explicitly disentangling token and context offers a viable alternative to standard self-attention for NLI tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio