2018 ACL ACL 2018

Comparison of Representations of Named Entities for Document Classification

Abstract

AbstractWe explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
📈 Trend Setter — Text Representation
🧭 Keyword Pioneer — named entity representation
🐣 Hot Topic Early Bird — word embedding
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio