MESA: Matching Everything by Segmenting Anything

Yesheng Zhang; Xu Zhao

2024 CVPR CVPR 2024

MESA: Matching Everything by Segmenting Anything

Abstract

Feature matching is a crucial task in the field of computer vision which involves finding correspondences between images. Previous studies achieve remarkable performance using learning-based feature comparison. However the pervasive presence of matching redundancy between images gives rise to unnecessary and error-prone computations in these methods imposing limitations on their accuracy. To address this issue we propose MESA a novel approach to establish precise area (or region) matches for efficient matching redundancy reduction. MESA first leverages the advanced image understanding capability of SAM a state-of-the-art foundation model for image segmentation to obtain image areas with implicit semantic. Then a multi-relational graph is proposed to model the spatial structure of these areas and construct their scale hierarchy. Based on graphical models derived from the graph the area matching is reformulated as an energy minimization task and effectively resolved. Extensive experiments demonstrate that MESA yields substantial precision improvement for multiple point matchers in indoor and outdoor downstream tasks e.g. +13.61% for DKM in indoor pose estimation.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yesheng Zhang , Xu Zhao

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Graph Neural Networks Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Scene Understanding Computer Vision > Core AI > Computer Vision Deep Learning > Models > Foundation Models Computer Vision > Processing > Image Retrieval

Keywords

semantic segmentation image segmentation pose estimation feature matching energy minimization semantic matching foundation model graph model graph neural network region matching

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024