FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching

Zimin Xia; Alexandre Alahi

2025 CVPR CVPR 2025

FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching

Abstract

We propose a novel fine-grained cross-view localization method that estimates the 3 Degrees of Freedom pose of a ground-level image in an aerial image of the surroundings by matching fine-grained features between the two images. The pose is estimated by aligning a point plane generated from the ground image with a point plane sampled from the aerial image. To generate the ground points, we first map ground image features to a 3D point cloud. Our method then learns to select features along the height dimension to pool the 3D points to a Bird's-Eye-View (BEV) plane. This selection enables us to trace which feature in the ground image contributes to the BEV representation. Next, we sample a set of sparse matches from computed point correspondences between the two point planes and compute their relative pose using Procrustes alignment. Compared to the previous state-of-the-art, our method reduces the mean localization error by 28% on the VIGOR cross-area test set. Qualitative results show that our method learns semantically consistent matches across ground and aerial views through weakly supervised learning from the camera pose.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — procrustes alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zimin Xia , Alexandre Alahi

Topics

Machine Learning > Core Methods > Metric Learning Computer Vision > Analysis > 3D Vision Computer Vision > Analysis > Scene Understanding Computer Vision > Domain-Specific > Autonomous Driving Machine Learning > Core Methods > Feature Learning Computer Vision > Analysis > Computer Vision Deep Learning > Learning Types > Multi-View Learning

Keywords

pose estimation feature matching bird's eye view point cloud processing 3d point cloud fine-grained feature aerial imagery bird's-eye view cross-view localization procrustes alignment

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025