Pretraining Sentiment Classifiers with Unlabeled Dialog Data

Toru Shimizu; Nobuyuki Shimizu; Hayato Kobayashi

2018 ACL ACL 2018

Pretraining Sentiment Classifiers with Unlabeled Dialog Data

Abstract

AbstractThe huge cost of creating labeled training data is a common problem for supervised learning tasks such as sentiment classification. Recent studies showed that pretraining with unlabeled data via a language model can improve the performance of classification models. In this paper, we take the concept a step further by using a conditional language model, instead of a language model. Specifically, we address a sentiment classification task for a tweet analysis service as a case study and propose a pretraining strategy with unlabeled dialog data (tweet-reply pairs) via an encoder-decoder model. Experimental results show that our strategy can improve the performance of sentiment classifiers and outperform several state-of-the-art strategies including language model pretraining.

🌱 Topic Pioneer — Pretraining

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

📈 Trend Setter — Pretraining

🧭 Keyword Pioneer — dialog datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Toru Shimizu , Nobuyuki Shimizu , Hayato Kobayashi

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Semi-Supervised Learning Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Sentiment Analysis Deep Learning > Learning Types > Transfer Learning Natural Language Processing > Resources & Methods > Pretraining Deep Learning > Learning Types > Pretraining

Keywords

sentiment classification language model unlabeled datum encoder-decoder model dialog datum

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018