LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Yiren Song; Danze Chen; Mike Zheng Shou

2025 ICCV ICCV 2025

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Abstract

Generating cognitive-aligned layered SVGs remains challenging due to existing methods' tendencies toward either oversimplified single-layer outputs or optimization-induced shape redundancies. We propose LayerTracer, a DiT based framework that bridges this gap by learning designers' layered SVG creation processes from a novel dataset of sequential design operations. Our approach operates in two phases: First, a text-conditioned DiT generates multi-phase rasterized construction blueprints that simulate human design workflows. Second, layer-wise vectorization with path deduplication produces clean, editable SVGs. For image vectorization, we introduce a conditional diffusion mechanism that encodes reference images into latent tokens, guiding hierarchical reconstruction while preserving structural integrity. Extensive experiments show that LayerTracer surpasses optimization-based and neural baselines in generation quality and editability.

🌉 Interdisciplinary Bridge — Computer Science and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yiren Song , Danze Chen , Mike Zheng Shou

Topics

Deep Learning > Architectures > Transformers Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Computer Science > Systems > Computer Graphics Deep Learning > Learning Types > Self-Supervised Learning

Keywords

image generation diffusion transformer svg synthesis vector graphics image vectorization cognitive alignment layer decomposition hierarchical reconstruction

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025