In-context Learning and Gradient Descent Revisited

Gilad Deutch; Nadav Magar; Tomer Natan; Guy Dar

2024 NAACL NAACL 2024

In-context Learning and Gradient Descent Revisited

Abstract

AbstractIn-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. A recent line of work suggests that ICL performs gradient descent (GD)-based optimization implicitly. While appealing, much of the research focuses on simplified settings, where the parameters of a shallow model are optimized. In this work, we revisit evidence for ICL-GD correspondence on realistic NLP tasks and models. We find gaps in evaluation, both in terms of problematic metrics and insufficient baselines. We show that surprisingly, even untrained models achieve comparable ICL-GD similarity scores despite not exhibiting ICL.Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality. We propose a simple GD-based optimization procedure that respects layer causality, and show it improves similarity scores significantly.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — layer causality

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Gilad Deutch , Nadav Magar , Tomer Natan , Guy Dar

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Transformers Machine Learning > Learning Types > Few-Shot Learning

Keywords

few-shot learning in-context learning neural network optimization gradient descent layer causality

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024