Evaluating the Prompt Steerability of Large Language Models

Erik Miehling; Michael Desmond; Karthikeyan Natesan Ramamurthy; Elizabeth M. Daly; Kush R. Varshney; Eitan Farchi; Pierre Dognin; Jesus Rios; Djallel Bouneffouf; Miao Liu; Prasanna Sattigeri

2025 NAACL NAACL 2025

Evaluating the Prompt Steerability of Large Language Models

Abstract

AbstractBuilding pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model’s joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited — due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at https://github.com/IBM/prompt-steering.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — persona steering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Erik Miehling , Michael Desmond , Karthikeyan Natesan Ramamurthy , Elizabeth M. Daly , Kush R. Varshney , Eitan Farchi , Pierre Dognin , Jesus Rios , Djallel Bouneffouf , Miao Liu , Prasanna Sattigeri

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Natural Language Processing > Resources & Methods > Large Language Models

Keywords

value system large language model prompt steering persona steering behavioral distribution

Download PDF

Few-shot Personalization of LLMs with Mis-aligned Responses 2025

NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals 2025

Understanding Figurative Meaning through Explainable Visual Entailment 2025

CogLM: Tracking Cognitive Development of Large Language Models 2025

Evaluating the Prompt Steerability of Large Language Models

Abstract

Authors

Topics

Keywords

Related papers