Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation

Jun-Yu Ma; Tianqing Fang; Zhisong Zhang; Hongming Zhang; Haitao Mi; Dong Yu

2025 EMNLP EMNLP 2025

Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation

Abstract

AbstractMamba’s theoretical infinite-context potential is limited in practice when sequences far exceed training lengths. This work explores unlocking Mamba’s long-context memory ability by a simple-yet-effective method, Recall with Reasoning (RwR), by distilling chain-of-thought (CoT) summarization from a teacher model. Specifically, RwR prepends these summarization as CoT prompts during fine-tuning, teaching Mamba to actively recall and reason over long contexts. Experiments on LONGMEMEVAL and HELMET show that RwR outperforms existing long-term memory methods on the Mamba model. Furthermore, under similar pre-training conditions, RwR improves the long-context performance of Mamba relative to comparable Transformer/hybrid baselines while preserving short-context capabilities, all without changing the architecture.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — long-context memory

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jun-Yu Ma , Tianqing Fang , Zhisong Zhang , Hongming Zhang , Haitao Mi , Dong Yu

Topics

Artificial Intelligence > Core AI > Memory Machine Learning > Application Areas > Knowledge Distillation Machine Learning > Application Areas > Model Compression Deep Learning > Models > Large Language Models Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Techniques > Self-Supervised Learning Deep Learning > Optimization & Theory > Model Compression

Keywords

sequence modeling knowledge distillation state space model chain of thought large language model chain-of-thought distillation long-context memory context extrapolation recall with reasoning model extrapolation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025