2026 WACV WACV 2026

DoTA: Latent Distribution Conditioned Data Attribution for Diffusion Models

Abstract

Diffusion models have emerged as the backbone of several modern generative AI models for effective visual content generation. However, their opaque nature raises fundamental questions about which training samples are responsible for specific generations, especially in applications involving bias detection, model auditing, and dataset curation. Data attribution seeks to identify the training samples that highly influence the output of generative models, a task that becomes especially challenging when targeting fine-scale attributes for attribution. Prior work has focused on broad concepts such as global features or entire images, often overlooking the nuances of fine-grained attributes and relying on group-based strategies that dilute individual influence. We propose a novel latent distribution conditioned method DoTA for data attribution. DoTA presents an effective search space pruning technique based on the latent distribution matching between the generated and training data for effective and controlled attribution. We demonstrate the attribution effectiveness through extensive quantitative and qualitative evaluations across challenging settings such as counterfactual evaluation and robustness to adversarial attack.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio