Matching Anything by Segmenting Anything

Siyuan Li; Lei Ke; Martin Danelljan; Luigi Piccinelli; Mattia Segu; Luc Van Gool; Fisher Yu

2024 CVPR CVPR 2024

Matching Anything by Segmenting Anything

Abstract

The robust association of the same objects across video frames in complex scenes is crucial for many applications especially object tracking. Current methods predominantly rely on labeled domain-specific video datasets which limits cross-domain generalization of learned similarity embeddings. We propose MASA a novel method for robust instance association learning capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM) MASA learns instance-level correspondence through exhausive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains. Extensive tests on multiple challenging MOT and MOTS benchmarks indicate that the proposed method using only unlabelled static images achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences in zero-shot association. Our code is available at https://github.com/siyuanliii/masa.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — dense object region proposal

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siyuan Li , Lei Ke , Martin Danelljan , Luigi Piccinelli , Mattia Segu , Luc Van Gool , Fisher Yu

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Zero-Shot Learning Computer Vision > Analysis > Object Tracking Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Computer Vision > Core AI > Multimodal Learning Artificial Intelligence > Core AI > Computer Vision Computer Vision > Analysis > Object Segmentation Deep Learning > Learning Types > Zero-Shot Learning

Keywords

zero-shot learning metric learning video understanding object tracking instance segmentation cross-domain generalization segment anything model instance association video matching zero-shot tracking dense object region proposal

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024