Where Does This Data Come From? Enhanced Source Inference Attacks in Federated Learning
Abstract
Federated learning (FL) enables collaborative model training without exposing raw data, offering a privacy-aware alternative to centralized learning. However, FL remains vulnerable to various privacy attacks that exploit shared model updates, including membership inference, property inference, and gradient inversion. Source inference attacks further threaten FL by identifying which client contributed a specific training sample, posing severe risks to user and institutional privacy. Existing source inference attacks mainly assume passive adversaries and overlook more realistic scenarios where the server actively manipulates the training process. In this paper, we present an enhanced source inference attack that demonstrates how a malicious server can amplify behavioral differences between clients to more accurately infer data origin. Our approach introduces active training manipulation and data augmentation to expose client-specific patterns. Experimental results across five representative FL algorithms and multiple datasets show that our method significantly outperforms prior passive attacks. These findings reveal a deeper level of privacy vulnerability in FL and call for stronger defense mechanisms under active threat models.