Incorporating Subword Information into Matrix Factorization Word Embeddings

Alexandre Salle; Aline Villavicencio

2018 NAACL NAACL 2018

Incorporating Subword Information into Matrix Factorization Word Embeddings

Abstract

AbstractThe positive effect of adding subword information to word embeddings has been demonstrated for predictive models. In this paper we investigate whether similar benefits can also be derived from incorporating subwords into counting models. We evaluate the impact of different types of subwords (n-grams and unsupervised morphemes), with results confirming the importance of subword information in learning representations of rare and out-of-vocabulary words.

🧭 Keyword Pioneer — unsupervised morpheme

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexandre Salle , Aline Villavicencio

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Core Methods > Embedding Learning

Keywords

matrix factorization word embedding out-of-vocabulary word subword information unsupervised morpheme counting model

Download PDF

Related papers

A Melody-Conditioned Lyrics Language Model 2018

Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation 2018

Automated Essay Scoring in the Presence of Biased Ratings 2018

Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input 2018

QuickEdit: Editing Text & Translations by Crossing Words Out 2018