Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting

Euihyun Yoon; Taejin Park; Jaekoo Lee

2026 WACV WACV 2026

Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting

Abstract

Object segmentation relies heavily on costly pixel-level annotations and struggles to generalize to unseen domains. The recent introduction of the Segment Anything Model (SAM), a foundation model for segmentation, offers a prompt-driven, zero-shot capability that has been applied in various domains (e.g., autonomous driving, satellite imagery, medical imaging) and extended to Few-Shot Segmentation (FSS) tasks. However, existing SAM-based FSS methods typically generate prompts by using a vision encoder to measure support-query image similarity, which often biases towards the support images and fails when there are significant support-query context shifts. To address this limitation, we propose a training-free FSS approach that combines visual and textual cues to generate effective prompts for the target class. By leveraging both vision and language information, our approach bridges the support-query gap and guides SAM to segment novel objects more reliably. Without any additional training, our method outperforms previous state-of-the-art FSS methods on established benchmarks (COCO\text - 20^i, Pascal\text - 5^i), demonstrating its effectiveness and robust generalization. Our code is publicly available on GitHub.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio