Multi-Scale Matching Networks for Semantic Correspondence
Abstract
Deep features have been proven powerful in building accurate dense semantic correspondences in various previous works. However, the multi-scale and pyramidal hierarchy of convolutional neural networks has not been well studied to learn discriminative pixel-level features for semantic correspondence. In this paper, we propose a multiscale matching network that is sensitive to tiny semantic differences between neighboring pixels. We follow the coarse-to-fine matching strategy, and build a top-down feature and matching enhancement scheme that is coupled with the multi-scale hierarchy of deep convolutional neural networks. During feature enhancement, intra-scale enhancement fuses same-resolution feature maps from multiple layers together via local self-attention, and cross-scale enhancement hallucinates higher resolution feature maps along the top-down hierarchy. Besides, we learn complementary matching details at different scales, and thus the overall matching score is refined by features at different semantic levels gradually. Our multi-scale matching network can be trained end-to-end easily with few additional learnable parameters. Experimental results demonstrate the proposed method achieves state-of-the-art performance on three popular benchmarks with high computational efficiency.