Syntax Encoding with Application in Authorship Attribution

Richong Zhang; Zhiyuan Hu; Hongyu Guo; Yongyi Mao

2018 EMNLP EMNLP 2018

Syntax Encoding with Application in Authorship Attribution

Abstract

AbstractWe propose a novel strategy to encode the syntax parse tree of sentence into a learnable distributed representation. The proposed syntax encoding scheme is provably information-lossless. In specific, an embedding vector is constructed for each word in the sentence, encoding the path in the syntax tree corresponding to the word. The one-to-one correspondence between these “syntax-embedding” vectors and the words (hence their embedding vectors) in the sentence makes it easy to integrate such a representation with all word-level NLP models. We empirically show the benefits of the syntax embeddings on the Authorship Attribution domain, where our approach improves upon the prior art and achieves new performance records on five benchmarking data sets.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — syntax embedding

🐣 Hot Topic Early Bird — authorship attribution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Richong Zhang , Zhiyuan Hu , Hongyu Guo , Yongyi Mao

Topics

Machine Learning > Core Methods > Embedding Learning Natural Language Processing > Understanding > Syntax Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Text Representation Deep Learning > Techniques > Representation Learning

Keywords

text classification syntax parsing authorship attribution distributed representation parse tree syntax encoding syntax embedding tree encoding syntax parse tree

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018