Region-Based Representations Revisited

Michal Shlapentokh-Rothman; Ansel Blume; Yao Xiao; Yuqun Wu; Sethuraman TV; Heyi Tao; Jae Yong Lee; Wilfredo Torres; Yu-Xiong Wang; Derek Hoiem

2024 CVPR CVPR 2024

Region-Based Representations Revisited

Abstract

We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks including semantic segmentation object-based image retrieval and multi-image analysis. Once the masks and features are extracted these representations even with linear decoders enable competitive performance making them well suited to applications that require custom queries. The compactness of the representation also makes it well-suited to video analysis and other problems requiring inference across many images.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Michal Shlapentokh-Rothman , Ansel Blume , Yao Xiao , Yuqun Wu , Sethuraman TV , Heyi Tao , Jae Yong Lee , Wilfredo Torres , Yu-Xiong Wang , Derek Hoiem

Topics

Machine Learning > Core Methods > Representation Learning Computer Vision > Analysis > Scene Understanding Computer Vision > Analysis > Semantic Segmentation

Keywords

unsupervised learning representation learning semantic segmentation image retrieval class-agnostic segmentation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024