Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding

Joseph Sanu; MingBin Xu; Hui Jiang; Quan Liu

2017 EMNLP EMNLP 2017

Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding

Abstract

AbstractIn this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — ordinal forgetting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joseph Sanu , MingBin Xu , Hui Jiang , Quan Liu

Topics

Machine Learning > Core Methods > Embedding Learning Machine Learning > Optimization & Theory > Optimization Natural Language Processing > Resources & Methods > Text Representation Machine Learning > Core Methods > Dimensionality Reduction Deep Learning > Learning Types > Representation Learning

Keywords

dimensionality reduction word embedding word similarity fixed-size encoding context encoding ordinal forgetting

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017