Cross-Domain Detection of Abusive Language Online

Vanja Mladen Karan; Jan Šnajder

2018 EMNLP EMNLP 2018

Cross-Domain Detection of Abusive Language Online

Abstract

AbstractWe investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over in-domain training, especially when used to augment a smaller dataset with a larger one.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Vanja Mladen Karan , Jan Šnajder

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Transfer Learning Machine Learning > Learning Types > Domain Adaptation Deep Learning > Learning Types > Domain Adaptation

Keywords

transfer learning domain adaptation text classification abusive language detection cross-domain generalization cross-domain classification

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018