2026 AAAI AAAI 2026

A Robust Unlearning Method with Adaptive Knowledge Guidance and Memory Preservation

Abstract

Abstract Machine unlearning has emerged as a promising approach to remove specific knowledge from large language models (LLMs), especially for safety-critical applications. However, existing representation-based methods lack guidance for selecting representation locations to unlearn (RMU), thus lacking precision in unlearning, while probability-based methods are vulnerable to fine-tuning attacks which use unrelated and safe data to fine-tune models. To address these problems, this paper presents an adaptive knowledge guidance and memory perturbation mechanisms, called ALMPU (Adaptive Localized Memory Perturbation Unlearning) which addresses the lack of knowledge guidance in representation-based unlearning methods and mitigates the impact of fine-tuning attacks on unlearned models. Specifically, we apply scaling factors to attention heads and select the most sensitive ones as knowledge guidance. Guided by the previous knowledge localization, we integrate enhanced memory perturbation—which forces the model to preserve specific knowledge—into the standard representation-based unlearning process at these sensitive positions. Through this perturbation mechanism, the model achieves more thorough elimination of the target knowledge. By adding interventions to selected attention heads and explicitly optimizing against fine-tuning attacks during the unlearning process, ALMPU creates a controlled divergence from the original model that is inherently resistant to relearning attempts. Experimental evaluation on the WMDP benchmark demonstrates that ALMPU consistently outperforms baseline methods across different scales of fine-tuning attacks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio