Memory Injections: Correcting Multi-Hop Reasoning Failures During Inference in Transformer-Based Language Models

Mansi Sakarvadia; Aswathy Ajith; Arham Khan; Daniel Grzenda; Nathaniel Hudson; André Bauer; Kyle Chard; Ian Foster

2023 EMNLP EMNLP 2023

Memory Injections: Correcting Multi-Hop Reasoning Failures During Inference in Transformer-Based Language Models

Abstract

AbstractAnswering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as “memories,” at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — memory injection

🐣 Hot Topic Early Bird — attention head

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mansi Sakarvadia , Aswathy Ajith , Arham Khan , Daniel Grzenda , Nathaniel Hudson , André Bauer , Kyle Chard , Ian Foster

Topics

Artificial Intelligence > Core AI > Memory Machine Learning > Application Areas > Efficient Computing Machine Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Reasoning

Keywords

attention mechanism attention head multi-hop reasoning transformer language model memory injection inference enhancement

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023