Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language

Michael Wiegand; Jana Kampfmeier; Elisabeth Eder; Josef Ruppenhofer

2023 EMNLP EMNLP 2023

Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language

Abstract

AbstractWe address the task of identifying euphemistic abuse (e.g. “You inspire me to fall asleep”) paraphrasing simple explicitly abusive utterances (e.g. “You are boring”). For this task, we introduce a novel dataset that has been created via crowdsourcing. Special attention has been paid to the generation of appropriate negative (non-abusive) data. We report on classification experiments showing that classifiers trained on previous datasets are less capable of detecting such abuse. Best automatic results are obtained by a classifier that augments training data from our new dataset with automatically-generated GPT-3 completions. We also present a classifier that combines a few manually extracted features that exemplify the major linguistic phenomena constituting euphemistic abuse.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Michael Wiegand , Jana Kampfmeier , Elisabeth Eder , Josef Ruppenhofer

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Weakly Supervised Learning Deep Learning > Models > Generative Models

Keywords

text classification data augmentation abusive language detection generative model

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023