Sea-CLIP: Mining Semantic-Aware Representations for Few-Shot Anomaly Detection with CLIP

Xiao Guo; Zhimin Chen; Carlos D. Castillo; Hongcheng Wang; Xiaoming Liu

2026 WACV WACV 2026

Sea-CLIP: Mining Semantic-Aware Representations for Few-Shot Anomaly Detection with CLIP

Abstract

Few-shot Anomaly Detection (FSAD) is a classic computer vision task, and recent FSAD methods utilize the pre-trained Vision-Language model, i.e., CLIP, to achieve remarkable performance. However, existing CLIP-based approaches disregard object semantics, a crucial factor for enhancing FSAD by guiding comparisons between semantically corresponding patches. To address this limitation, we propose Sea-CLIP, a novel method that integrates semantic-aware representations from DINOv2 to enhance FSAD representation learning. Specifically, Sea-CLIP first leverages a Patch Matching module that uses semantic-aware representations to obtain coarse anomaly segmentation masks. These anomaly masks guide a lightweight Anomaly Matching Decoder (AMD) to utilize CLIP and DINOv2 features for FSAD jointly, and AMD innovatively formulates FSAD as a feature-matching task. Also, unlike prior patch-matching works that directly compute anomaly scores, our method utilizes the AMD to refine coarse predictions into a precise anomaly mask. Our Sea-CLIP achieves state-of-the-art FSAD performance on MVTec and VisA datasets, and we provide a detailed analysis of contributions from semantic-aware representations in identifying anomaly patterns.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio