The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI)

Nasrin Imanpour; Abhilekh Borah; Shashwat Bajpai; Subhankar Ghosh; Sainath Reddy Sankepally; Hasnat Md Abdullah; Nishoak Kosaraju; Shreyas Dixit; Ashhar Aziz; Shwetangshu Biswas; Vinija Jain; Aman Chadha; Song Wang; Amit Sheth; Amitava Das

2025 AACL AACL 2025

The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI)

Abstract

AbstractThe rapid progress and widespread availability of text-to-image (T2I) generation models have heightened concerns about the misuse of AI-generated visuals, particularly in the context of misinformation campaigns. Existing AI-generated image detection (AGID) methods often overfit to known generators and falter on outputs from newer or unseen models. To systematically address this generalization gap, we introduce the Visual Counter Turing Test (VCT^2), a comprehensive benchmark of 166,000 images, comprising both real and synthetic prompt-image pairs produced by six state-of-the-art (SoTA) T2I systems: Stable Diffusion 2.1, SDXL, SD3 Medium, SD3.5 Large, DALL·E 3, and Midjourney 6. We curate two distinct subsets: COCO_AI, featuring structured captions from MS COCO, and Twitter_AI, containing narrative-style tweets from The New York Times. Under a unified zero-shot evaluation, we benchmark 17 leading AGID models and observe alarmingly low detection accuracy, 58% on COCO_AI and 58.34% on Twitter_AI. To transcend binary classification, we propose the Visual AI Index (V_AI), an interpretable, prompt-agnostic realism metric based on twelve low-level visual features, enabling us to quantify and rank the perceptual quality of generated outputs with greater nuance. Correlation analysis reveals a moderate inverse relationship between V_AI and detection accuracy: Pearson rho of -0.532 on COCO_AI and rho of -0.503 on Twitter_AI; suggesting that more visually realistic images tend to be harder to detect, a trend observed consistently across generators. We release COCO_AI and Twitter_AI to catalyze future advances in robust AGID and perceptual realism assessment.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nasrin Imanpour , Abhilekh Borah , Shashwat Bajpai , Subhankar Ghosh , Sainath Reddy Sankepally , Hasnat Md Abdullah , Nishoak Kosaraju , Shreyas Dixit , Ashhar Aziz , Shwetangshu Biswas , Vinija Jain , Aman Chadha , Song Wang , Amit Sheth , Amitava Das

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Classification Deep Learning > Models > Generative Models

Keywords

zero-shot learning image generation binary classification generative model

Download PDF

Related papers

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems 2025

Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection 2025

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics 2025