Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

Jie Hao; Xing Wang; Shuming Shi; Jinfeng Zhang; Zhaopeng Tu

2019 IJCNLP IJCNLP 2019

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

Abstract

AbstractRecent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks RNNs outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs – Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jie Hao , Xing Wang , Shuming Shi , Jinfeng Zhang , Zhaopeng Tu

Topics

Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Natural Language Processing > Generation > Machine Translation Deep Learning > Learning Types > Representation Learning

Keywords

machine translation hierarchical structure recurrent neural network self-attention network ordered neuron

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019