Demographic Word Embeddings for Racism Detection on Twitter

Mohammed Hasanuzzaman; Gaël Dias; Andy Way

2017 IJCNLP IJCNLP 2017

Demographic Word Embeddings for Racism Detection on Twitter

Abstract

AbstractMost social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

📈 Trend Setter — Responsible AI

🧭 Keyword Pioneer — demographic information

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammed Hasanuzzaman , Gaël Dias , Andy Way

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Embedding Learning Interdisciplinary > Social > Social Media Analysis Machine Learning > Learning Types > Supervised Learning Natural Language Processing > Applications > Sentiment Analysis

Keywords

text classification supervised learning word embedding social media demographic information racism detection

Download PDF

Related papers

Procedural Text Generation from an Execution Video 2017

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset 2017

Roles and Success in Wikipedia Talk Pages: Identifying Latent Patterns of Behavior 2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts 2017

Alibaba at IJCNLP-2017 Task 1: Embedding Grammatical Features into LSTMs for Chinese Grammatical Error Diagnosis Task 2017