Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Dinghan Shen; Guoyin Wang; Wenlin Wang; Martin Renqiang Min; Qinliang Su; Yizhe Zhang; Chunyuan Li; Ricardo Henao; Lawrence Carin

2018 ACL ACL 2018

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Abstract

AbstractMany deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Feature Learning

🧭 Keyword Pioneer — hierarchical pooling

🐣 Hot Topic Early Bird — document classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dinghan Shen , Guoyin Wang , Wenlin Wang , Martin Renqiang Min , Qinliang Su , Yizhe Zhang , Chunyuan Li , Ricardo Henao , Lawrence Carin

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Text Classification Deep Learning > Learning Types > Representation Learning Machine Learning > Learning Types > Feature Learning

Keywords

text classification document classification text matching word embedding hierarchical pooling pooling operation sequence matching parameter-free pooling

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018