Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Zachary Horvitz; Jingru Chen; Rahul Aditya; Harshvardhan Srivastava; Robert West; Zhou Yu; Kathleen McKeown

2024 ACL ACL 2024

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Abstract

AbstractHumor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. We investigate whether large language models (LLMs) can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to “unfun” jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset where we find that GPT-4’s synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zachary Horvitz , Jingru Chen , Rahul Aditya , Harshvardhan Srivastava , Robert West , Zhou Yu , Kathleen McKeown

Topics

Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Understanding > Sentiment Analysis Deep Learning > Models > Large Language Models Artificial Intelligence > Core AI > Natural Language Processing Deep Learning > Learning Types > Generative Models

Keywords

dataset creation text editing data augmentation humor detection synthetic data generation synthetic datum adversarial example large language model

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024