Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang; Bingfeng Zhang; Jian Pang; Honglong Chen; Weifeng Liu

2024 CVPR CVPR 2024

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Abstract

Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However such a prior representation suffers from coarse granularity and poor generalization to new classes since these high-level feature maps have obvious category bias. In this work we propose to replace the visual prior representation with the visual-text alignment capacity to capture more reliable guidance and enhance the model generalization. Specifically we design two kinds of training-free prior information generation strategy that attempts to utilize the semantic alignment capability of the Contrastive Language-Image Pre-training model (CLIP) to locate the target class. Besides to acquire more accurate prior guidance we build a high-order relationship of attention maps and utilize it to refine the initial prior information. Experiments on both the PASCAL-5i and COCO-20i datasets show that our method obtains a clearly substantial improvement and reaches the new state-of-the-art performance. The code is available on the project website.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — prior information generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jin Wang , Bingfeng Zhang , Jian Pang , Honglong Chen , Weifeng Liu

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Zero-Shot Learning Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Semantic Segmentation Machine Learning > Learning Paradigms > Few-Shot Learning Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Few-Shot Learning

Keywords

semantic segmentation image segmentation few-shot learning transfer learning prior information few-shot segmentation visual-text alignment prior information generation semantic alignment capability attention map refinement

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024