CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

Yiren Song; Xuning Shao; Kang Chen; Weidong Zhang; Zhongliang Jing; Minzhe Li

2023 AAAI AAAI 2023

CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

Abstract

Abstract Considerable progress has recently been made in leveraging CLIP (Contrastive Language-Image Pre-Training) models for text-guided image manipulation. However, all existing works rely on additional generative models to ensure the quality of results, because CLIP alone cannot provide enough guidance information for fine-scale pixel-level changes. In this paper, we introduce CLIPVG, a text-guided image manipulation framework using differentiable vector graphics, which is also the first CLIP-based general image manipulation framework that does not require any additional generative models. We demonstrate that CLIPVG can not only achieve state-of-art performance in both semantic correctness and synthesis quality, but also is flexible enough to support various applications far beyond the capability of all existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

📈 Trend Setter — Foundation Models

🧭 Keyword Pioneer — differentiable vector graphics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yiren Song , Xuning Shao , Kang Chen , Weidong Zhang , Zhongliang Jing , Minzhe Li

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Deep Learning > Learning Types > Contrastive Learning Computer Vision > Generation > Image Editing Computer Vision > Core AI > Foundation Models

Keywords

contrastive learning image generation image synthesis differentiable rendering clip model language guidance semantic editing image manipulation vector graphics text-guided image manipulation differentiable vector graphics

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023