Universal Weighting Metric Learning for Cross-Modal Matching

Jiwei Wei; Xing Xu; Yang Yang; Yanli Ji; Zheng Wang; Heng Tao Shen

2020 CVPR CVPR 2020

Universal Weighting Metric Learning for Cross-Modal Matching

Abstract

Cross-modal matching has been a highlighted research topic in both vision and language areas. Learning appropriate mining strategy to sample and weight informative pairs is crucial for the cross-modal matching performance. However, most existing metric learning methods are developed for unimodal matching, which is unsuitable for cross-modal matching on multimodal data with heterogeneous features. To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions. Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively. Experimental results on two image-text matching benchmarks and two video-text matching benchmarks validate the efficacy of the proposed method.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — polynomial loss

🐣 Hot Topic Early Bird — image-text matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiwei Wei , Xing Xu , Yang Yang , Yanli Ji , Zheng Wang , Heng Tao Shen

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Metric Learning Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Metric Learning Computer Vision > Applications > Image Retrieval

Keywords

representation learning metric learning image-text matching cross-modal matching polynomial loss universal weighting video-text matching weighting strategy

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020