DiffUTE: Universal Text Editing Diffusion Model

Haoxing Chen; Zhuoer Xu; Zhangxuan Gu; jun lan; 行 郑; Yaohui Li; Changhua Meng; Huijia Zhu; Weiqiang Wang

2023 NIPS NeurIPS 2023

DiffUTE: Universal Text Editing Diffusion Model

Abstract

Diffusion model based language-guided image editing has achieved great success recently. However, existing state-of-the-art diffusion models struggle with rendering correct text and text style during generation. To tackle this problem, we propose a universal self-supervised text editing diffusion model (DiffUTE), which aims to replace or modify words in the source image with another one while maintaining its realistic appearance. Specifically, we build our model on a diffusion model and carefully modify the network structure to enable the model for drawing multilingual characters with the help of glyph and position information. Moreover, we design a self-supervised learning framework to leverage large amounts of web data to improve the representation ability of the model. Experimental results show that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity. Our code will be avaliable in \url{https://github.com/chenhaoxing/DiffUTE}.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — multilingual character

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoxing Chen , Zhuoer Xu , Zhangxuan Gu , jun lan , 行郑 , Yaohui Li , Changhua Meng , Huijia Zhu , Weiqiang Wang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Models > Diffusion Models Computer Vision > Processing > Image Editing

Keywords

self-supervised learning text editing image editing diffusion model multilingual character glyph information

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023