Fine-Tuning Image Transformers Using Learnable Memory

Mark Sandler; Andrey Zhmoginov; Max Vladymyrov; Andrew Jackson

2022 CVPR CVPR 2022

Fine-Tuning Image Transformers Using Learnable Memory

Abstract

In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these 'memory tokens'. We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy when compared to conventional head-only fine-tuning, and performs only slightly below the significantly more expensive full fine-tuning. We then propose an attention-masking approach that enables models to preserve their previous capabilities, while extending them to new downstream tasks. This approach, which we call 'non-destructive fine-tuning', enables computation reuse across multiple tasks while being able to learn new tasks independently.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — learnable memory

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mark Sandler , Andrey Zhmoginov , Max Vladymyrov , Andrew Jackson

Topics

Machine Learning > Learning Types > Continual Learning Deep Learning > Architectures > Transformers Deep Learning > Techniques > Model Architecture

Keywords

continual learning vision transformer attention mask learnable memory

Download PDF

Related papers

UniCoRN: A Unified Conditional Image Repainting Network 2022

Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis 2022

All-in-One Image Restoration for Unknown Corruption 2022

Stability-Driven Contact Reconstruction From Monocular Color Images 2022

Forecasting Characteristic 3D Poses of Human Actions 2022