GrounDial: Human-norm Grounded Safe Dialog Response Generation

Siwon Kim; Shuyang Dai; Mohammad Kachuee; Shayan Ray; Tara Taghavi; Sungroh Yoon

2024 EACL EACL 2024

GrounDial: Human-norm Grounded Safe Dialog Response Generation

Abstract

AbstractCurrent conversational AI systems based on large language models (LLMs) are known to generate unsafe responses agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siwon Kim , Shuyang Dai , Mohammad Kachuee , Shayan Ray , Tara Taghavi , Sungroh Yoon

Topics

Artificial Intelligence > Core AI > Responsible AI Natural Language Processing > Generation > Dialogue Systems

Keywords

in-context learning commonsense reasoning dialogue system toxicity reduction large language model safe response

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024