PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

Dawei Xiang; Wenyan Xu; Kexin Chu; Tianqi Ding; Zixu Shen; Yiming Zeng; Jianchang Su; Wei Zhang

2025 EMNLP EMNLP 2025

PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

Abstract

AbstractThe rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image (T2I) models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context—often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought (CoT) reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dawei Xiang , Wenyan Xu , Kexin Chu , Tianqi Ding , Zixu Shen , Yiming Zeng , Jianchang Su , Wei Zhang

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Generative Models

Keywords

chain-of-thought reasoning text-to-image generation generative model prompt optimization multi-agent system

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025