Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation

Xueqing Deng; Linjie Yang; Qihang Yu; Chenglin Yang; Liang-Chieh Chen

2025 ICCV ICCV 2025

Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation

Abstract

Text-to-image (T2I) models have advanced rapidly with diffusion-based breakthroughs, yet their evaluation remains challenging. Human assessments are costly, and existing automated metrics lack accurate compositional understanding. To address these limitations, we introduce PSG-Bench, a novel benchmark featuring 5K text prompts designed to evaluate the capabilities of advanced T2I models. Additionally, we propose PSGEval, a scene graph-based evaluation metric that converts generated images into structured representations and applies graph matching techniques for accurate and scalable assessment. PSGEval is a detection based evaluation metric without relying on QA generations. Our experimental results demonstrate that PSGEval aligns well with human evaluations, mitigating biases present in existing automated metrics. We further provide a detailed ranking and analysis of recent T2I models, offering a robust framework for future research in T2I evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Natural Language Processing

🧭 Keyword Pioneer — image evaluation metric

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xueqing Deng , Linjie Yang , Qihang Yu , Chenglin Yang , Liang-Chieh Chen

Topics

Artificial Intelligence > Core AI > Foundation Models Computer Vision > Analysis > Scene Understanding Computer Vision > Generation > Image Generation Natural Language Processing > Generation > Text Generation

Keywords

graph matching text-to-image generation scene graph diffusion model panoptic segmentation image evaluation metric image evaluation

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025