Multimodal Image Matching Based on Cross-Modality Completion Pre-training

Meng Yang; Fan Fan; Jun Huang; Yong Ma; Xiaoguang Mei; Zhanchuan Cai; Jiayi Ma

2025 IJCAI IJCAI 2025

Multimodal Image Matching Based on Cross-Modality Completion Pre-training

Abstract

The differences in imaging devices cause multimodal images to have modal differences and geometric distortions, complicating the matching task. Deep learning-based matching methods struggle with multimodal images due to the lack of large annotated multimodal datasets. To address these challenges, we propose XCP-Match based on cross-modality completion pre-training. XCP-Match has two phases. (1) Self-supervised cross-modality completion pre-training based on real multimodal image dataset. We develop a novel pre-training model to learn cross-modal semantic features. The pre-training uses masked image modeling method for cross-modality completion, and introduces an attention-weighted contrastive loss to emphasize matching in overlapping areas. (2) Supervised fine-tuning for multimodal image matching based on the augmented MegaDepth dataset. XCP-Match constructs a complete matching framework to overcome geometric distortions and achieve precise matching. Two-phase training encourages the model to learn deep cross-modal semantic information, improving adaptation to modal differences without needing large annotated datasets. Experiments demonstrate that XCP-Match outperforms existing algorithms on public datasets.

🧭 Keyword Pioneer — multimodal image matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Meng Yang , Fan Fan , Jun Huang , Yong Ma , Xiaoguang Mei , Zhanchuan Cai , Jiayi Ma

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Self-Supervised Learning

Keywords

contrastive learning self-supervised learning masked image modeling multimodal image matching cross-modality completion

Download PDF

Related papers

Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain 2025

Responsibility Anticipation and Attribution in LTLf 2025

Argument-based Multi-Issue Negotiation 2025

Online Resource Sharing: Better Robust Guarantees via Randomized Strategies 2025

Equitable Mechanism Design for Facility Location 2025