2024 CVPR CVPR 2024

RoMa: Robust Dense Feature Matching

Abstract

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene and dense methods estimate all such correspondences. The aim is to learn a robust model i.e. a model able to match under challenging real-world changes. In this work we propose such a model leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch they are inherently coarse. We therefore combine them with specialized ConvNet fine features creating a precisely localizable feature pyramid. To further improve robustness we propose a tailored transformer match decoder that predicts anchor probabilities which enables it to express multimodality. Finally we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method RoMa achieves significant gains setting a new state-of-the-art. In particular we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at github.com/Parskatt/RoMa.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning
🧭 Keyword Pioneer — anchor probability
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio