Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Betty van Aken; Julian Risch; Ralf Krestel; Alexander Löser

2018 EMNLP EMNLP 2018

Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Abstract

AbstractToxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task’s challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of-the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — toxic comment classification

🐣 Hot Topic Early Bird — error analysis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Betty van Aken , Julian Risch , Ralf Krestel , Alexander Löser

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Text Classification Deep Learning > Learning Types > Deep Learning

Keywords

ensemble learning text classification toxic comment classification deep learning ensemble method error analysis dataset label

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018