Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

Juan Hu; Shaojing Fan; Terence Sim

2025 ICCV ICCV 2025

Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

Abstract

Multi-face deepfake videos are becoming increasingly prevalent, often appearing in natural social settings that challenge existing detection methods. Most current approaches excel at single-face detection but struggle in multi-face scenarios, due to a lack of awareness of crucial contextual cues. In this work, we develop a novel approach that leverages human cognition to analyze and defend against multi-face deepfake videos. Through a series of human studies, we systematically examine how people detect deepfake faces in social settings. Our quantitative analysis reveals four key cues humans rely on: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency. Guided by these insights, we introduce \textsf HICOM , a novel framework designed to detect every fake face in multi-face scenarios. Extensive experiments on benchmark datasets show that \textsf HICOM improves average accuracy by 3.3% in in-dataset detection and 2.8% under real-world perturbations. Moreover, it outperforms existing methods by 5.8% on unseen datasets, demonstrating the generalization of human-inspired cues. \textsf HICOM further enhances interpretability by incorporating an LLM to provide human-readable explanations, making detection results more transparent and convincing. Our work sheds light on involving human factors to enhance defense against deepfakes.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — human-inspired learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Juan Hu , Shaojing Fan , Terence Sim

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > Face Recognition Computer Vision > Analysis > Object Detection Artificial Intelligence > Core AI > Adversarial Learning

Keywords

scene understanding face recognition deepfake detection human-computer interaction adversarial defense large language model multi-face detection human-inspired learning

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025