Show Me a Story: Towards Coherent Neural Story Illustration

Hareesh Ravi; Lezi Wang; Carlos Muniz; Leonid Sigal; Dimitris Metaxas; Mubbasir Kapadia

2018 CVPR CVPR 2018

Show Me a Story: Towards Coherent Neural Story Illustration

Abstract

We propose an end-to-end network for the visual illustration of a sequence of sentences forming a story. At the core of our model is the ability to model the inter-related nature of the sentences within a story, as well as the ability to learn coherence to support reference resolution. The framework takes the form of an encoder-decoder architecture, where sentences are encoded using a hierarchical two-level sentence-story GRU, combined with an encoding of coherence, and sequentially decoded using predicted feature representation into a consistent illustrative image sequence. We optimize all parameters of our network in an end-to-end fashion with respect to order embedding loss, encoding entailment between images and sentences. Experiments on the VIST storytelling dataset cite{vist} highlight the importance of our algorithmic choices and efficacy of our overall model.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Natural Language Processing

📈 Trend Setter — Image Captioning

🧭 Keyword Pioneer — story illustration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hareesh Ravi , Lezi Wang , Carlos Muniz , Leonid Sigal , Dimitris Metaxas , Mubbasir Kapadia

Topics

Computer Vision > Generation > Image Generation Deep Learning > Learning Types > Multi-Modal Learning Natural Language Processing > Applications > Image Captioning

Keywords

sequence modeling image generation encoder-decoder architecture hierarchical encoder hierarchical encoding coherence modeling order embedding story illustration

Download PDF

Related papers

Multi-Shot Pedestrian Re-Identification via Sequential Decision Making 2018

Multi-Cue Correlation Filters for Robust Visual Tracking 2018

Pointwise Convolutional Neural Networks 2018

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking 2018

Image Generation From Scene Graphs 2018