Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

Zhe Zhao; Tao Liu; Shen Li; Bofang Li; Xiaoyong Du

2017 EMNLP EMNLP 2017

Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

Abstract

AbstractThe existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — ngram embedding

🐣 Hot Topic Early Bird — word representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Zhe Zhao , Tao Liu , Shen Li , Bofang Li , Xiaoyong Du

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Resources & Methods > Text Representation

Keywords

co-occurrence statistics semantic similarity word analogy word representation ngram embedding

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017