DialogDraw: Image Generation and Editing System Based on Multi-Turn Dialogue

Shichao Ma; Xinfeng Zhang; Zeng Zhao; Bai Liu; Changjie Fan; ZHIPENG HU

2025 AAAI AAAI 2025

DialogDraw: Image Generation and Editing System Based on Multi-Turn Dialogue

Abstract

Abstract In recent years, diffusion modeling has shown great potential for image generation and editing. Beyond single-model approaches, various drawing workflows now exist to handle diverse drawing tasks. However, few solutions effectively identify user intentions through dialogue and progressively complete drawings. We introduce DialogDraw, which facilitates image generation and editing through continuous dialogue interaction. DialogDraw enables users to create and refine drawings using natural language and integrates with numerous open-source drawing workflows and models. The system accurately recognizes intentions and extracts user inputs via parameterization, adapts to various drawing function parameters, and provides an intuitive interaction mode. It effectively executes user instructions, supports dozens of image generation and editing methods, and offers robust scalability. Moreover, we employ SFT and RLHF to iterate the Intention Recognition and Parameter Extraction Model (IRPEM). To evaluate DialogDraw's functionality, we propose DrawnConvos, a dataset rich in drawing functions and command dialogue data collected from the open-source community. Our evaluation demonstrates that DialogDraw excels in command compliance, identifying and adapting to user drawing intentions, thereby proving the effectiveness of our method.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shichao Ma , Xinfeng Zhang , Zeng Zhao , Bai Liu , Changjie Fan , ZHIPENG HU

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Vision > Processing > Image Editing Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

image generation image editing reinforcement learning from human feedback natural language diffusion model multi-turn dialogue intention recognition

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025