Robust Native Language Identification through Agentic Decomposition

Ahmet Yavuz Uluslu; Tannon Kew; Tilia Ellendorff; Gerold Schneider; Rico Sennrich

2025 EMNLP EMNLP 2025

Robust Native Language Identification through Agentic Decomposition

Abstract

AbstractLarge language models (LLMs) often achieve high performance in native language identification (NLI) benchmarks by leveraging superficial contextual clues such as names, locations, and cultural stereotypes, rather than the underlying linguistic patterns indicative of native language (L1) influence. To improve robustness, previous work has instructed LLMs to disregard such clues. In this work, we demonstrate that such a strategy is unreliable and model predictions can be easily altered by misleading hints. To address this problem, we introduce an agentic NLI pipeline inspired by forensic linguistics, where specialized agents accumulate and categorize diverse linguistic evidence before an independent final overall assessment. In this final assessment, a goal-aware coordinating agent synthesizes all evidence to make the NLI prediction. On two benchmark datasets, our approach significantly enhances NLI robustness against misleading contextual clues and performance consistency compared to standard prompting methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — linguistic evidence

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ahmet Yavuz Uluslu , Tannon Kew , Tilia Ellendorff , Gerold Schneider , Rico Sennrich

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multi-Agent Systems Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Natural Language Inference

Keywords

forensic linguistics linguistic analysis native language identification large language model multi-agent system llm robustness linguistic evidence agentic decomposition

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025