Expressing an Image Stream with a Sequence of Natural Sentences

Cesc C Park; Gunhee Kim

2015 NIPS NeurIPS 2015

Expressing an Image Stream with a Sequence of Natural Sentences

Abstract

We propose an approach for generating a sequence of natural sentences for an image stream. Since general users usually take a series of pictures on their special moments, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole set to generate natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. To this end, we design a novel architecture called coherent recurrent convolutional network (CRCN), which consists of convolutional networks, bidirectional recurrent networks, and entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We demonstrate that our approach outperforms other state-of-the-art candidate methods, using both quantitative measures (e.g. BLEU and top-K recall) and user studies via Amazon Mechanical Turk.

🌱 Topic Pioneer — Image Captioning

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Natural Language Processing

📈 Trend Setter — Multi-Modal Learning

🧭 Keyword Pioneer — image stream

🐣 Hot Topic Early Bird — natural language generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Cesc C Park , Gunhee Kim

Topics

Computer Vision > Generation > Image Captioning Deep Learning > Learning Types > Multi-Modal Learning Natural Language Processing > Generation > Image Captioning

Keywords

natural language generation image captioning convolutional neural network recurrent neural network visual language image stream

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015