HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Eslam Mohamed Bakr; Pengzhan Sun; Xiaoqian Shen; Faizan Farooq Khan; Li Erran Li; Mohamed Elhoseiny

2023 ICCV ICCV 2023

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Abstract

Designing robust text-to-image (T2I) models have been extensively explored in recent years, especially with the emergence of diffusion models, which achieves state-of-the-art results on T2I synthesis tasks. Despite the significant effort and success in this direction, we observed that the existing metrics need to be more robust to measure real progress. Therefore, comparing the existing models are more complex and heavily subjective for human evaluations. In addition, we observe that the efforts in developing new architectures do not coincide with efforts in the evaluation direction. Driven by this observation, the importance of designing a concrete evaluation emerges to fill the gap between designing and evaluation efforts. Accordingly, we introduce our holistic, reliable, and scalable benchmark, termed \papernameAbbrev , for T2I models. Unlike the existing benchmarks, which focus on limited aspects, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias. In addition, \papernameAbbrev covers 50 applications, e.g., fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. We study 13 skills, e.g., robustness, fairness, and bias. To probe the effectiveness of our \papernameAbbrev , a human evaluation is conducted, which is aligned with 95% with our evaluations on average across the 13 skills. We hope our findings, e.g., all the existing models can not generate visual text nor emotionally grounded images, help accelerate and direct future research. To this end, the code and data are available at https://eslambakr.github.io/hrsbench.github.io/.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Eslam Mohamed Bakr , Pengzhan Sun , Xiaoqian Shen , Faizan Farooq Khan , Li Erran Li , Mohamed Elhoseiny

Topics

Machine Learning > Application Areas > Fairness Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Machine Learning > Optimization & Theory > Evaluation

Keywords

benchmark evaluation image generation model robustness text-to-image synthesis bias detection diffusion model evaluation metrics

Download PDF

Related papers

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework 2023

Periodically Exchange Teacher-Student for Source-Free Object Detection 2023

Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations 2023

Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles 2023

3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation 2023