Using In-Context Learning to Improve Dialogue Safety

Nicholas Meade; Spandana Gella; Devamanyu Hazarika; Prakhar Gupta; Di Jin; Siva Reddy; Yang Liu; Dilek Hakkani-Tur

2023 EMNLP EMNLP 2023

Using In-Context Learning to Improve Dialogue Safety

Abstract

AbstractWhile large neural-based conversational models have become increasingly proficient dialogue agents, recent work has highlighted safety issues with these systems. For example, these systems can be goaded into generating toxic content, often perpetuating social biases or stereotypes. We investigate a retrieval-based approach for reducing bias and toxicity in responses from chatbots. It uses in-context learning to steer a model towards safer generations. Concretely, to generate a response to an unsafe dialogue context, we retrieve demonstrations of safe responses to similar dialogue contexts. We find our method performs competitively with existing approaches to dialogue safety without requiring training. We also show, using automatic and human evaluation, that reductions in toxicity obtained using our approach are not at the cost engagingness or coherency. Finally, we note our method can be used in compliment to existing dialogue safety approaches, such as RLHF.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nicholas Meade , Spandana Gella , Devamanyu Hazarika , Prakhar Gupta , Di Jin , Siva Reddy , Yang Liu , Dilek Hakkani-Tur

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Dialogue Systems Artificial Intelligence > Core AI > Fairness Machine Learning > Learning Types > In-Context Learning Natural Language Processing > Applications > Dialogue Systems Deep Learning > Learning Types > In-Context Learning Artificial Intelligence > Core AI > Dialogue Systems

Keywords

in-context learning dialogue safety bias mitigation bias reduction toxicity reduction conversational model retrieval-based generation retrieval-based approach

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023