Arabic Offensive Language on Twitter: Analysis and Experiments

Hamdy Mubarak; Ammar Rashed; Kareem Darwish; Younes Samih; Ahmed Abdelali

2021 EACL EACL 2021

Arabic Offensive Language on Twitter: Analysis and Experiments

Abstract

AbstractDetecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers useoffensive language. Lastly, we conduct many experiments to produce strong results (F1 =83.2) on the dataset using SOTA techniques.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Hamdy Mubarak , Ammar Rashed , Kareem Darwish , Younes Samih , Ahmed Abdelali

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Fairness Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Supervised Learning Natural Language Processing > Applications > Sentiment Analysis

Keywords

sentiment analysis natural language processing text classification social media analysis offensive language detection hate speech detection twitter analysis arabic language processing

Download PDF

Related papers

Joint Coreference Resolution and Character Linking for Multiparty Conversation 2021

Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering 2021

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO 2021

Representations for Question Answering from Documents with Tables and Text 2021

Gender and Racial Fairness in Depression Research using Social Media 2021