2021
ACL
ACL 2021
IITK@Detox at SemEval-2021 Task 5: Semi-Supervised Learning and Dice Loss for Toxic Spans Detection
Abstract
AbstractIn this work, we present our approach and findings for SemEval-2021 Task 5 - Toxic Spans Detection. The task’s main aim was to identify spans to which a given text’s toxicity could be attributed. The task is challenging mainly due to two constraints: the small training dataset and imbalanced class distribution. Our paper investigates two techniques, semi-supervised learning and learning with Self-Adjusting Dice Loss, for tackling these challenges. Our submitted system (ranked ninth on the leader board) consisted of an ensemble of various pre-trained Transformer Language Models trained using either of the above-proposed techniques.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— toxic spans detection
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Semi-Supervised Learning
Machine Learning > Optimization & Theory > Loss Functions
Natural Language Processing > Understanding > Named Entity Recognition
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Large Language Models