AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation

Kaustubh Dhole; Ramraj Chandradevan; Eugene Agichtein

2025 EMNLP EMNLP 2025

AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation

Abstract

AbstractEvaluating outputs from large language models (LLMs) presents significant challenges, especially as hallucinations and adversarial manipulations are often difficult to detect. Existing evaluation methods lack robustness against subtle yet intentional linguistic alterations, necessitating novel techniques for reliably assessing model-generated content. Training accurate and robust groundedness evaluators is key for mitigating hallucinations and ensuring the alignment of model or human-generated claims to real-world evidence. However, as we show, many models, while optimizing for accuracy, lack robustness to subtle variations of claims, making them unsuitable and brittle in real-world settings where adversaries employ purposeful and deceitful tactics like hedging to deceive readers, which go beyond surface-level variations. To address this problem, we propose AdvERSem, a controllable adversarial approach to manipulating LLM output via Abstract Meaning Representations (AMR) to generate attack claims of multiple fine-grained types, followed by automatic verification of the correct label. By systematically manipulating a unique linguistic facet AdvERSem provides an interpretable testbed for gauging robustness as well as useful training data. We demonstrate that utilizing these AMR manipulations during training across multiple fact verification datasets helps improve the accuracy and robustness of groundedness evaluation while also minimizing the requirement of costly annotated data. To encourage further systematic evaluation, we release AdvERSem-Test, a manually verified groundedness test-bed.

🧭 Keyword Pioneer — groundedness evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

Authors

Kaustubh Dhole , Ramraj Chandradevan , Eugene Agichtein

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Responsible AI Natural Language Processing > Applications > Fact-Checking Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Adversarial Learning Deep Learning > Learning Types > Adversarial Learning

Keywords

adversarial robustness fact verification responsible ai abstract meaning representation llm evaluation groundedness evaluation semantic structure hallucination mitigation hallucination detection

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025