Learning Multilingual Agentic Policy to Control Sycophancy

Leonardo Ranaldi; Giulia Pucci

2026 EACL EACL 2026

Learning Multilingual Agentic Policy to Control Sycophancy

Abstract

AbstractLarge Language Models (LLMs) are highly effective at adapting to users’ styles, preferences, and contextual signals—a property that underlies much of their practical usefulness, but which can even manifest as sycophancy, i.e., alignment with user-implied beliefs evenwhen these contradict factual accuracy or rational reasoning. Prior work treats sycophancy as a surface-level artefact addressed via inference-time or post-hoc methods. We argue that it is a policy-level failure arising from missing agentic control over agreement under pressure. To make sycophancy amenable to explicit control, we propose learning agentic policies modelling LLMs’ behaviour as a decision-making problem. Our approach equips a single model with an explicit action space that includes answering directly, countering misleading signals, or asking for clarification. The policy is trained to optimise a multi-objective reward that balances task success, sycophancy resistance, and behavioural consistency via a control mechanism that operates through agentic behaviour. We evaluate the method on different benchmarks, showing that the approaches reduce sycophancy, improving performance, and generalise robustly across languages. These findings suggest that mitigating sycophancy requires moving beyond compliance-oriented generation towards agreement-agentic control.

🧭 Keyword Pioneer — agentic policy

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Leonardo Ranaldi , Giulia Pucci

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Human-AI Interaction

Keywords

reward modeling multi-objective optimization language model human-ai interaction agentic policy

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026