Adaptive Cross-Modal Embeddings for Image-Text Alignment

Jonatas Wehrmann; Camila Kolling; Rodrigo C Barros

2020 AAAI AAAI 2020

Adaptive Cross-Modal Embeddings for Image-Text Alignment

Abstract

Abstract a using an embedding vector of an instance from modality b. Such an adaptation is designed to filter and enhance important information across internal features, allowing for guided vector representations – which resembles the working of attention modules, though far more computationally efficient. Experimental results on two large-scale Image-Text alignment datasets show that ADAPT models outperform all the baseline approaches by large margins. Particularly, for Image Retrieval, ADAPT, with a single model, outperforms the state-of-the-art approach by a relative improvement of R@1 ≈ 24% and for Image Annotation, R@1 ≈ 8% on Flickr30k dataset. On MS COCO it provides an improvement of R@1 ≈ 12% for Image Retrieval, and ≈ 7% R@1 for Image Annotation. Code is available at https://github.com/jwehrmann/retrieval.pytorch.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐣 Hot Topic Early Bird — image-text alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jonatas Wehrmann , Camila Kolling , Rodrigo C Barros

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Embedding Learning Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Metric Learning Deep Learning > Learning Types > Multi-Modal Learning Computer Vision > Analysis > Image Retrieval

Keywords

image retrieval attention mechanism image annotation cross-modal embedding image-text alignment feature adaptation

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020