Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Dirk Hovy; Tommaso Fornaciari

2018 EMNLP EMNLP 2018

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Abstract

AbstractMost text-classification approaches represent the input based on textual features, either feature-based or continuous. However, this ignores strong non-linguistic similarities like homophily: people within a demographic group use language more similar to each other than to non-group members. We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter. This approach increases in-class similarity between authors, and improves classification performance by making classes more linearly separable. We evaluate the effect of our method on two author-attribute prediction tasks with various training-set sizes and parameter settings. We find that our method can significantly improve classification performance, especially when the number of labels is large and limited labeled data is available. It is potentially applicable as preprocessing step to any text-classification task.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — author representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dirk Hovy , Tommaso Fornaciari

Topics

Machine Learning > Core Methods > Classification Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Representation Learning

Keywords

representation learning feature extraction text classification linear separability author representation embedding retrofitting demographic information author attribution

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018