SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

Michał Satława; Katarzyna Zamłyńska; Jarosław Piersa; Joanna Kolis; Klaudia Firląg; Katarzyna Beksa; Zuzanna Bordzicka; Christian Goltz; Paweł Bujnowski; Piotr Andruszkiewicz

2021 ACL ACL 2021

SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

Abstract

AbstractThis paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language models. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — toxic word classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Michał Satława , Katarzyna Zamłyńska , Jarosław Piersa , Joanna Kolis , Klaudia Firląg , Katarzyna Beksa , Zuzanna Bordzicka , Christian Goltz , Paweł Bujnowski , Piotr Andruszkiewicz

Topics

Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Ensemble Learning

Keywords

binary classification ensemble learning text classification data augmentation bert model language model ensemble model threshold optimization toxic span detection toxic span toxic word classification training data generation span detection bert ensemble binary word classification toxic comment detection

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021