2026 WACV WACV 2026

HumanBench: Two Heads, No Legs, But Mostly Human, the State of Generative Capabilities in T2I Models

Abstract

Despite rapid advances, text-to-image (T2I) models still falter in generating anatomically coherent and semantically grounded humans. We introduce HumanBench, a large-scale (35K-image), privacy-friendly benchmark that rigorously evaluates T2I models across four axes: template consistency, spatial reasoning, action understanding, and texture recognition. To quantify alignment, we propose two novel metrics--Agreement and Distinction--capturing both fidelity to prompts and semantic contrast with counterfactuals and negations.Evaluating six leading models, we uncover persistent failures including disfigurements, species leakage, texture-object mismatches, and counting errors, especially under compound prompts. A complementary human study reveals that image realism and correctness degrade with prompt complexity, validating our automated assessments. HumanBench offers the first comprehensive audit of human-centric T2I generation, setting a new standard for benchmarking anatomical accuracy, compositional reasoning, and trustworthiness in generative models.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — anatomical coherence
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio