ManiGAN: Text-Guided Image Manipulation

Bowen Li; Xiaojuan Qi; Thomas Lukasiewicz; Philip H.S. Torr

2020 CVPR CVPR 2020

ManiGAN: Text-Guided Image Manipulation

Abstract

The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Meanwhile, it encodes original image features to help reconstruct text-irrelevant contents. The DCM rectifies mismatched attributes and completes missing contents of the synthetic image. Finally, we suggest a new metric for evaluating image manipulation results, in terms of both the generation of new attributes and the reconstruction of text-irrelevant contents. Extensive experiments on the CUB and COCO datasets demonstrate the superior performance of the proposed method.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — text-image alignment

🐣 Hot Topic Early Bird — image editing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bowen Li , Xiaojuan Qi , Thomas Lukasiewicz , Philip H.S. Torr

Topics

Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing Deep Learning > Learning Types > Generative Models

Keywords

image reconstruction image editing generative adversarial network semantic editing text-image alignment text-guided image manipulation detail correction mask pooling text-guided manipulation text-image affine combination

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020