Comparison of Representations of Named Entities for Document Classification

Lidia Pivovarova; Roman Yangarber

2018 ACL ACL 2018

Comparison of Representations of Named Entities for Document Classification

Abstract

AbstractWe explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

📈 Trend Setter — Text Representation

🧭 Keyword Pioneer — named entity representation

🐣 Hot Topic Early Bird — word embedding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Lidia Pivovarova , Roman Yangarber

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Deep Learning > Architectures > Convolutional Neural Networks

Keywords

text classification named entity recognition text representation document classification convolutional neural network word embedding named entity named entity representation proper noun topic classification

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018