Lexical Generalization Improves with Larger Models and Longer Training

Elron Bandel; Yoav Goldberg; Yanai Elazar

2022 EMNLP EMNLP 2022

Lexical Generalization Improves with Larger Models and Longer Training

Abstract

AbstractWhile fine-tuned language models perform well on many language tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset),and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, We provide evidence that the disparity between models size has its source in the pre-trained model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — surface feature

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio