Neural Word Decomposition Models for Abusive Language Detection

Sravan Bodapati; Spandana Gella; Kasturi Bhattacharjee; Yaser Al-Onaizan

2019 ACL ACL 2019

Neural Word Decomposition Models for Abusive Language Detection

Abstract

AbstractThe text we see in social media suffers from lots of undesired characterstics like hatespeech, abusive language, insults etc. The nature of this text is also very different compared to the traditional text we see in news with lots of obfuscated words, intended typos. This poses several robustness challenges to many natural language processing (NLP) techniques developed for traditional text. Many techniques proposed in the recent times such as charecter encoding models, subword models, byte pair encoding to extract subwords can aid in dealing with few of these nuances. In our work, we analyze the effectiveness of each of the above techniques, compare and contrast various word decomposition techniques when used in combination with others. We experiment with recent advances of finetuning pretrained language models, and demonstrate their robustness to domain shift. We also show our approaches achieve state of the art performance on Wikipedia attack, toxicity datasets, and Twitter hatespeech dataset.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — word decomposition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Sravan Bodapati , Spandana Gella , Kasturi Bhattacharjee , Yaser Al-Onaizan

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Deep Learning Deep Learning > Learning Types > Representation Learning

Keywords

domain adaptation abusive language detection domain shift pretrained language model social media word decomposition subword model byte pair encoding character encoding subword encoding

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019