fBERT: A Neural Transformer for Identifying Offensive Content

Diptanu Sarkar; Marcos Zampieri; Tharindu Ranasinghe; Alexander Ororbia

2021 EMNLP EMNLP 2021

fBERT: A Neural Transformer for Identifying Offensive Content

Abstract

AbstractTransformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT’s performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Diptanu Sarkar , Marcos Zampieri , Tharindu Ranasinghe , Alexander Ororbia

Topics

Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Supervised Learning Deep Learning > Models > Transformers Artificial Intelligence > Core AI > Natural Language Processing Machine Learning > Application Areas > Text Classification

Keywords

text classification offensive language detection bert model hate speech detection offensive content detection transformer model

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021