The Art of Saying No: Contextual Noncompliance in Language Models

Faeze Brahman; Sachin Kumar; Vidhisha Balachandran; Pradeep Dasigi; Valentina Pyatkin; Abhilasha Ravichander; Sarah Wiegreffe; Nouha Dziri; Khyathi Chandu; Jack Hessel; Yulia Tsvetkov; Noah A. Smith; Yejin Choi; Hannaneh Hajishirzi

2024 NIPS NeurIPS 2024

The Art of Saying No: Contextual Noncompliance in Language Models

Abstract

Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of ``unsafe'' queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30\% of requests.To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — low rank adapter

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Faeze Brahman , Sachin Kumar , Vidhisha Balachandran , Pradeep Dasigi , Valentina Pyatkin , Abhilasha Ravichander , Sarah Wiegreffe , Nouha Dziri , Khyathi Chandu , Jack Hessel , Yulia Tsvetkov , Noah A. Smith , Yejin Choi , Hannaneh Hajishirzi

Topics

Artificial Intelligence > Core AI > AI Safety Machine Learning > Application Areas > Model Merging Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Fine-Tuning Artificial Intelligence > Core AI > Safety

Keywords

instruction tuning language model low-rank adaptation safety training low rank adapter

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024