Iterative Forward Tuning Boosts In-Context Learning in Language Models

Jiaxi Yang; Binyuan Hui; Min Yang; bailin wang; Bowen Li; Binhua Li; Fei Huang; Yongbin Li

2024 ACL ACL 2024

Iterative Forward Tuning Boosts In-Context Learning in Language Models

Abstract

AbstractDespite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation. This mechanism operates by manipulating the Key-Value matrices without training, fostering enhanced understanding capabilities in LLMs by thinking demonstrations multiple times. We evaluated Deep-Thinking across a range of benchmarks and LLMs, showing its superior performance over vanilla ICL methods and its effectiveness in challenging tasks where demonstration selection is infeasible.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — key-value matrix

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiaxi Yang , Binyuan Hui , Min Yang , bailin wang , Bowen Li , Binhua Li , Fei Huang , Yongbin Li

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Artificial Intelligence > Learning Paradigms > Meta-Learning Natural Language Processing > Resources & Methods > Large Language Models Deep Learning > Models > Large Language Models Deep Learning > Learning Types > In-Context Learning

Keywords

few-shot learning attention mechanism in-context learning iterative learning demonstration selection large language model iterative attention key-value matrix

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024