Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

Samuel Carton; Qiaozhu Mei; Paul Resnick

2018 EMNLP EMNLP 2018

Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

Abstract

AbstractWe introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. Building on an existing architecture for extractive explanations via hard attention, we add an adversarial layer which scans the residual of the attention for remaining predictive signal. Motivated by the important domain of detecting personal attacks in social media comments, we additionally demonstrate the importance of manually setting a semantically appropriate “default” behavior for the model by explicitly manipulating its bias term. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — extractive explanation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Samuel Carton , Qiaozhu Mei , Paul Resnick

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Adversarial Learning

Keywords

adversarial learning text classification hard attention extractive explanation personal attack detection

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018