Not All Attention Is Needed: Gated Attention Network for Sequence Data

Lanqing Xue; Xiaopeng Li; Nevin L. Zhang

2020 AAAI AAAI 2020

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Abstract

Abstract Although deep neural networks generally have fixed network structures, the concept of dynamic mechanism has drawn more and more attention in recent years. Attention mechanisms compute input-dependent dynamic attention weights for aggregating a sequence of hidden states. Dynamic network configuration in convolutional neural networks (CNNs) selectively activates only part of the network at a time for different inputs. In this paper, we combine the two dynamic mechanisms for text classification tasks. Traditional attention mechanisms attend to the whole sequence of hidden states for an input sentence, while in most cases not all attention is needed especially for long sequences. We propose a novel method called Gated Attention Network (GA-Net) to dynamically select a subset of elements to attend to using an auxiliary network, and compute attention weights to aggregate the selected elements. It avoids a significant amount of unnecessary computation on unattended elements, and allows the model to pay attention to important parts of the sequence. Experiments in various datasets show that the proposed method achieves better performance compared with all baseline models with global or local attention while requiring less computation and achieving better interpretability. It is also promising to extend the idea to more complex attention-based models, such as transformers and seq-to-seq models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — dynamic attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lanqing Xue , Xiaopeng Li , Nevin L. Zhang

Topics

Deep Learning > Architectures > Transformers Deep Learning > Architectures > Neural Networks Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Attention Artificial Intelligence > Core AI > Attention

Keywords

text classification attention mechanism dynamic network dynamic attention gated attention neural network sequence datum

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020