SynerDetect: Hierarchical Synergistic Learning for Generalizable AI-Generated Image Detection

Shuaibo Li; Yijun Yang; Zhaohu Xing; Hongqiu Wang; Pengfei Hao; Xingyu Li; Zekai Liu; Qing Zhang; LEI ZHU

2026 AAAI AAAI 2026

SynerDetect: Hierarchical Synergistic Learning for Generalizable AI-Generated Image Detection

Abstract

Abstract The rapid advancement of generative models, which produce increasingly realistic synthetic images, urgently demands robust and generalizable detection methods. Consequently, research has largely pivoted to leveraging large-scale Vision Foundation Models (VFMs) for enhanced generalization. However, existing VFM-based approaches primarily adhere to either perceptual or generative paradigms, each with limitations: perceptual models capture high-level semantics but often miss subtle artifacts, whereas generative models emphasize fine-grained flaws yet overlook semantic inconsistency. To resolve this inherent trade-off, we introduce SynerDetect, a novel hierarchical synergistic framework that fundamentally unifies the two paradigms. SynerDetect achieves deep integration of heterogeneous forensic representations through two levels of synergy: Cross-Model Interactive Distillation (CMID) distills generative forensic signals into perceptual encoders via prompt-guided reconstruction; and Optimal Transport-Guided Discriminative Contrastive Learning (OT-DCL) structurally aligns and integrates these heterogeneous representations, consolidating them into a robust, unified detection space. SynerDetect achieves superior performance on standard benchmarks (AIGCDetectBenchmark and GenImage) and attains a notable 5.20% accuracy gain on the challenging Chameleon benchmark, whose synthetic images consistently pass the Visual Turing Test. These results unequivocally validate the robust, real-world generalization of our unified cross-paradigm framework.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — generative image detection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shuaibo Li , Yijun Yang , Zhaohu Xing , Hongqiu Wang , Pengfei Hao , Xingyu Li , Zekai Liu , Qing Zhang , LEI ZHU

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Classification Deep Learning > Architectures > Neural Networks

Keywords

contrastive learning optimal transport semantic alignment vision foundation model generative image detection cross-model distillation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026