A Position Paper on Toxic Reasoning: Grounding Categories of Toxic Language in Implications and Attitudes
Abstract
AbstractAutomatic detection of toxic language has the potential to considerably improve engagement with online spaces. Previous work has characterized toxic language detection as a classification problem, often using fine-grained classes for increased explainability. In this position paper, we argue for a particular way of operationalizing categories of toxic language. Our approach focuses on what is expressed or implied, and breaks down implications based on two traits: (i) the core content of what was expressed, and (ii) relevant stakeholders’ attitudes towards that content. We argue for an approach, which we call toxic reasoning, where such distinctions are made explicit. We point out the benefits for such an approach, and develop a toxic reasoning schema, which can explain categories of toxic language from diverse sources. We demonstrate this by mapping the classes of existing toxic language datasets to the schema. Toxic reasoning promises to provide improved understanding of implicit toxicity while increasing explainability.