Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract)

Yow-Fu Liou; Yu-Chien Tang; An-Zi Yen

2026 AAAI AAAI 2026

Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract)

Abstract

Abstract When evaluating large language models (LLMs) for question answering tasks, a common protocol is multiple-choice question-answering (MCQA), where the model selects from a fixed set of choices. In contemporary robustness testing, researchers typically perturb instructions or introduce confusion into factual statements; however, model behavior also hinges on choice compliance: whether models remain within the canonical set A-D. We formalize this setting by asking whether the model continues to respect the interface's rules when the problem presents a tempting alternative. Our approach is interface-preserving: we append a single selectable option E while keeping the question and A-D unchanged. Then, we introduce three types of malicious option injection to assess LLMs' robustness. Experimental results highlight the vulnerability of LLMs on contradict type content of the additional option E. Our evaluation framework can effectively serve as a low-cost audit of rule adherence on existing datasets and black-box models, surfaces off-policy items, and supports interpretable model comparison for deployment.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — multiple-choice question-answering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yow-Fu Liou , Yu-Chien Tang , An-Zi Yen

Topics

Artificial Intelligence > Core AI > AI Safety Natural Language Processing > Resources & Methods > Large Language Models

Keywords

robustness testing adversarial evaluation large language model multiple-choice question-answering option injection

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026