VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping

Zheng Chen; Yu Zeng; Zehui Chen; Hongzhi Gao; Lin Chen; Jiaming Liu; Feng Zhao

2025 AAAI AAAI 2025

VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping

Abstract

Abstract Although pre-trained large vision foundation models (VFM) yield superior results on various downstream tasks, full fine-tuning is often impractical due to its high computational cost and storage requirements. Recent advancements in parameter-efficient fine-tuning (PEFT) of VFM for image classification show significant promise. However, the application of PEFT techniques to dense prediction tasks remains largely unexplored. Our analysis of existing methods reveals that the underlying premise of utilizing low-rank parameter matrices, despite their efficacy in specific applications, may not be adequately suitable for dense prediction tasks. To this end, we propose a novel PEFT learning approach tailored for dense prediction tasks, namely VFM-Adapter. Specifically, the VFM-Adapter introduces a hybrid operation mapping technique that seamlessly integrates local information with global modeling to the adapter module. It capitalizes on the distinct inductive biases inherent in different operations. Additionally, we dynamically generate parameters for the VFM-Adapter, enabling flexibility of feature extraction given specific inputs. To validate the efficacy of VFM-Adapter, we conduct extensive experiments across object detection, semantic segmentation, and instance segmentation tasks. Results on multiple benchmarks consistently demonstrate the superiority of our method over previous approaches. Notably, with only three percent of the trainable parameters of the SAM-Base backbone, our approach achieves competitive or even superior performance compared to full fine-tuning. The code will be available.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zheng Chen , Yu Zeng , Zehui Chen , Hongzhi Gao , Lin Chen , Jiaming Liu , Feng Zhao

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Application Areas > Efficient Computing Computer Vision > Processing > Image Segmentation Computer Vision > Core AI > Computer Vision Deep Learning > Optimization & Theory > Model Compression Computer Vision > Core AI > Efficient Computing Deep Learning > Techniques > Transfer Learning Deep Learning > Learning Types > Transfer Learning

Keywords

semantic segmentation object detection parameter-efficient fine-tuning parameter efficient fine-tuning dense prediction vision foundation model visual foundation model adapter learning

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025