Semantics Disentangling for Text-To-Image Generation

Guojun Yin; Bin Liu; Lu Sheng; Nenghai Yu; Xiaogang Wang; Jing Shao

2019 CVPR CVPR 2019

Semantics Disentangling for Text-To-Image Generation

Abstract

Synthesizing photo-realistic images from text descriptions is a challenging problem. Previous studies have shown remarkable progresses on visual quality of the generated images. In this paper, we consider semantics from the input text descriptions in helping render photo-realistic images. However, diverse linguistic expressions pose challenges in extracting consistent semantics even they depict the same thing. To this end, we propose a novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high-level semantic consistency and low-level semantic diversity. To be specific, we design (1) a Siamese mechanism in the discriminator to learn consistent high-level semantics, and (2) a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low-level semantics. Extensive experiments and ablation studies on CUB and MS-COCO datasets demonstrate the superiority of the proposed method in comparison to state-of-the-art methods.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

📈 Trend Setter — Vision-Language Models

🧭 Keyword Pioneer — semantic disentangling

🐣 Hot Topic Early Bird — text-to-image generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Guojun Yin , Bin Liu , Lu Sheng , Nenghai Yu , Xiaogang Wang , Jing Shao

Topics

Deep Learning > Models > Generative Models Deep Learning > Techniques > Model Architecture Computer Vision > Generation > Image Generation Deep Learning > Learning Types > Generative Models Deep Learning > Models > Vision-Language Models

Keywords

image synthesis text-to-image generation semantic consistency visual-semantic embedding generative model generative adversarial network semantic disentangling siames network semantics disentangling

Download PDF

Related papers

Fast Single Image Reflection Suppression via Convex Optimization 2019

Learning Video Representations From Correspondence Proposals 2019

ATOM: Accurate Tracking by Overlap Maximization 2019

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters 2019

Edge-Labeling Graph Neural Network for Few-Shot Learning 2019