Deep Correlation for Matching Images and Text

Fei Yan; Krystian Mikolajczyk

2015 CVPR CVPR 2015

Deep Correlation for Matching Images and Text

Abstract

This paper addresses the problem of matching images and captions in a joint latent space learnt with deep canonical correlation analysis (DCCA). The image and caption data are represented by the outputs of the vision and text based deep neural networks. The high dimensionality of the features presents a great challenge in terms of memory and speed complexity when used in DCCA framework. We address these problems by a GPU implementation and propose methods to deal with overfitting. This makes it possible to evaluate DCCA approach on popular caption-image matching benchmarks. We compare our approach to other recently proposed techniques and present state of the art results on three datasets.

🌱 Topic Pioneer — Multi-Modal Learning

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

📈 Trend Setter — Multi-Modal Learning

🧭 Keyword Pioneer — deep canonical correlation analysis

🐣 Hot Topic Early Bird — multi-modal learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fei Yan , Krystian Mikolajczyk

Topics

Machine Learning > Core Methods > Metric Learning Machine Learning > Core Methods > Embedding Learning Computer Vision > Generation > Image Captioning Deep Learning > Models > Multi-Modal Learning Computer Vision > Applications > Computer Vision

Keywords

multi-modal learning cross-modal retrieval deep canonical correlation analysis image-text matching joint latent space

Download PDF

Related papers

Long-Term Correlation Tracking 2015

Hierarchically-Constrained Optical Flow 2015

Propagated Image Filtering 2015

Web Scale Photo Hash Clustering on A Single Machine 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos 2015