Character-Based Models for Adversarial Phone Extraction: Preventing Human Sex Trafficking

Nathanael Chambers; Timothy Forman; Catherine Griswold; Kevin Lu; Yogaish Khastgir; Stephen Steckler

2019 EMNLP EMNLP 2019

Character-Based Models for Adversarial Phone Extraction: Preventing Human Sex Trafficking

Abstract

AbstractIllicit activity on the Web often uses noisy text to obscure information between client and seller, such as the seller’s phone number. This presents an interesting challenge to language understanding systems; how do we model adversarial noise in a text extraction system? This paper addresses the sex trafficking domain, and proposes some of the first neural network architectures to learn and extract phone numbers from noisy text. We create a new adversarial advertisement dataset, propose several RNN-based models to solve the problem, and most notably propose a visual character language model to interpret unseen unicode characters. We train a CRF jointly with a CNN to improve number recognition by 89% over just a CRF. Through data augmentation in this unique model, we present the first results on characters never seen in training.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — adversarial text

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nathanael Chambers , Timothy Forman , Catherine Griswold , Kevin Lu , Yogaish Khastgir , Stephen Steckler

Topics

Machine Learning > Learning Types > Adversarial Learning Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Information Extraction Natural Language Processing > Applications > Named Entity Recognition Machine Learning > Core Methods > Sequence Labeling Deep Learning > Architectures > Recurrent Neural Networks

Keywords

adversarial learning sequence labeling data augmentation information extraction named entity recognition convolutional neural network recurrent neural network conditional random field adversarial noise adversarial text character model phone number extraction phone extraction

Download PDF

Related papers

Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation 2019

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference 2019

A Boundary-aware Neural Model for Nested Named Entity Recognition 2019

Iterative Dual Domain Adaptation for Neural Machine Translation 2019

A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation 2019