Semantic Compositional Networks for Visual Captioning

Zhe Gan; Chuang Gan; Xiaodong He; Yunchen Pu; Kenneth Tran; Jianfeng Gao; Lawrence Carin; Li Deng

2017 CVPR CVPR 2017

Semantic Compositional Networks for Visual Captioning

Abstract

A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🧭 Keyword Pioneer — visual captioning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhe Gan , Chuang Gan , Xiaodong He , Yunchen Pu , Kenneth Tran , Jianfeng Gao , Lawrence Carin , Li Deng

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Computer Vision > Generation > Image Captioning Deep Learning > Learning Types > Representation Learning Deep Learning > Architectures > Recurrent Neural Networks

Keywords

video captioning image captioning long short-term memory semantic composition visual captioning tag detection

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017