Graphic Design with Large Multimodal Model

Yutao Cheng; Zhao Zhang; Maoke Yang; Hui Nie; Chunyuan Li; Xinglong Wu; Jie Shao

2025 AAAI AAAI 2025

Graphic Design with Large Multimodal Model

Abstract

Abstract In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of layers, thus limiting creative potential and increasing user workload. In this paper, we present Hierarchical Layout Generation (HLG) as a more flexible and pragmatic setup, which creates graphic composition from any-ordered sets of design elements. To tackle the HLG task, we introduce Graphist, the first layout generation model based on large multimodal models. Graphist efficiently reframes the HLG as a sequence generation problem, utilizing RGB-A images as input, outputs a JSON draft protocol, indicating the coordinates, size, and order of each element. We develop multiple evaluation metrics for HLG. Graphist outperforms prior arts and establishes a strong baseline for this field.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — hierarchical layout

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yutao Cheng , Zhao Zhang , Maoke Yang , Hui Nie , Chunyuan Li , Xinglong Wu , Jie Shao

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Generative Models Deep Learning > Models > Large Language Models Natural Language Processing > Applications > Text Generation Computer Science > Applications > Computer Graphics Artificial Intelligence > Core AI > Multi-Modal Learning

Keywords

sequence generation multimodal learning large multimodal model layout generation graphic design hierarchical layout

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025