Prompt Refinement with Image Pivot for Text-to-Image Generation

Jingtao Zhan; Qingyao Ai; Yiqun Liu; Yingwei Pan; Ting Yao; Jiaxin Mao; Shaoping Ma; Tao Mei

2024 ACL ACL 2024

Prompt Refinement with Image Pivot for Text-to-Image Generation

Abstract

AbstractFor text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from “user languages” into “system languages”. However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary “pivot” between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — prompt refinement

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jingtao Zhan , Qingyao Ai , Yiqun Liu , Yingwei Pan , Ting Yao , Jiaxin Mao , Shaoping Ma , Tao Mei

Topics

Machine Learning > Learning Types > Zero-Shot Learning Computer Vision > Generation > Image Generation Natural Language Processing > Generation > Text Generation

Keywords

zero-shot learning machine translation text-to-image generation prompt refinement

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024