StyleNet: Generating Attractive Visual Captions With Styles

Chuang Gan; Zhe Gan; Xiaodong He; Jianfeng Gao; Li Deng

2017 CVPR CVPR 2017

StyleNet: Generating Attractive Visual Captions With Styles

Abstract

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Machine Learning

📈 Trend Setter — Natural Language Generation

🧭 Keyword Pioneer — style generation

🐣 Hot Topic Early Bird — style transfer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chuang Gan , Zhe Gan , Xiaodong He , Jianfeng Gao , Li Deng

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Computer Vision > Generation > Image Captioning Artificial Intelligence > Core AI > Natural Language Generation

Keywords

style transfer video captioning multimodal learning image captioning language generation neural network style generation factored long short-term memory

Download PDF

Related papers

Deep Outdoor Illumination Estimation 2017

SRN: Side-output Residual Network for Object Symmetry Detection in the Wild 2017

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition 2017

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization 2017