Good News, Everyone! Context Driven Entity-Aware Captioning for News Images

Ali Furkan Biten; Lluis Gomez; Marcal Rusinol; Dimosthenis Karatzas

2019 CVPR CVPR 2019

Good News, Everyone! Context Driven Entity-Aware Captioning for News Images

Abstract

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the world. In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. For this we focus on the captioning of images used to illustrate news articles. We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image. Our model is able to selectively draw information from the article guided by visual cues, and to dynamically extend the output dictionary to out-of-vocabulary named entities that appear in the context source. Furthermore we introduce "GoodNews", the largest news image captioning dataset in the literature and demonstrate state-of-the-art results.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — news image

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ali Furkan Biten , Lluis Gomez , Marcal Rusinol , Dimosthenis Karatzas

Topics

Computer Vision > Generation > Image Captioning Interdisciplinary > Linguistics > Semantics Natural Language Processing > Applications > Named Entity Recognition Deep Learning > Learning Types > Multi-Modal Learning

Keywords

named entity recognition multimodal learning image captioning contextual information news image news image captioning contextual information integration entity-aware captioning out-of-vocabulary named entity visual-semantic grounding

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019