Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks

Cristiano Chesi; Veronica Bressan; Matilde Barbini; Achille Fusco; Maria Letizia Piccini Bianchessi; Sofia Neri; Sarah Rossi; Tommaso Sgrizzi

2024 CONLL CoNLL 2024

Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks

Abstract

AbstractThis work explores alternative gating systems in simple Recurrent Neural Networks (RNNs) that induce linguistically motivated biases during training, ultimately affecting models’ performance on the BLiMP task. We focus exclusively on the BabyLM 10M training corpus (Strict-Small Track). Our experiments reveal that: (i) standard RNN variants—LSTMs and GRUs—are insufficient for properly learning the relevant set of linguistic constraints; (ii) the quality or size of the training corpus has little impact on these networks, as demonstrated by the comparable performance of LSTMs trained exclusively on the child-directed speech portion of the corpus; (iii) increasing the size of the embedding and hidden layers does not significantly improve performance. In contrast, specifically gated RNNs (eMG-RNNs), inspired by certain Minimalist Grammar intuitions, exhibit advantages in both training loss and BLiMP accuracy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Cristiano Chesi , Veronica Bressan , Matilde Barbini , Achille Fusco , Maria Letizia Piccini Bianchessi , Sofia Neri , Sarah Rossi , Tommaso Sgrizzi

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Architectures > Neural Networks

Keywords

recurrent neural network gating mechanism linguistic constraint forget gate minimalist grammar

Download PDF

Related papers

Lossy Context Surprisal Predicts Task-Dependent Patterns in Relative Clause Processing 2024

Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models 2024

Transformer verbatim in-context retrieval across time and scale 2024

EditEval: An Instruction-Based Benchmark for Text Improvements 2024

An Empirical Comparison of Vocabulary Expansion and Initialization Approaches For Language Models 2024