Symbolic Representation for Any-to-Any Generative Tasks

Jiaqi Chen; Xiaoye Zhu; Yue Wang; Tianyang Liu; Xinhui Chen; Ying Chen; Chak Tou Leong; Yifei Ke; Joseph Liu; Yiwen Yuan; Julian McAuley; Li-jia Li

2025 CVPR CVPR 2025

Symbolic Representation for Any-to-Any Generative Tasks

Abstract

We propose a symbolic generative task description language and a corresponding inference engine that can represent arbitrary multimodal tasks as structured symbolic flows. Unlike conventional generative models, which rely on large-scale training and implicit neural representations to learn cross-modal mappings--often with high computational costs and limited flexibility--our framework introduces an explicit symbolic representation composed of three core primitives: functions, parameters, and topological logic. Using a pre-trained language model, our inference engine maps natural language instructions directly to symbolic workflows in a training-free manner. Our framework successfully performs over 12 diverse multimodal generative tasks, demonstrating strong performance and flexibility without requiring task-specific tuning. Experiments show that our method not only matches or outperforms existing state-of-the-art unified models in content quality but also offers greater efficiency, editability, and interruptibility. We believe symbolic task representations provide a cost-effective and extensible foundation for advancing the capabilities of generative AI.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning

🧭 Keyword Pioneer — multimodal generative task

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiaqi Chen , Xiaoye Zhu , Yue Wang , Tianyang Liu , Xinhui Chen , Ying Chen , Chak Tou Leong , Yifei Ke , Joseph Liu , Yiwen Yuan , Julian McAuley , Li-jia Li

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Generative Models

Keywords

language model symbolic representation training-free inference multimodal generative task structured symbolic flow

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025