2023 INTERSPEECH INTERSPEECH 2023

HumanDiffusion: diffusion model using perceptual gradients

Abstract

We propose HumanDiffusion, a diffusion model trained from humans' perceptual gradients to learn an acceptable range of data for humans (i.e., human-acceptable distribution). Conventional HumanGAN aims to model the human-acceptable distribution wider than the real-data distribution by training a neural network-based generator with human-based discriminators. However, HumanGAN training tends to converge in a meaningless distribution due to the gradient vanishing or mode collapse and requires careful heuristics. In contrast, our HumanDiffusion learns the human-acceptable distribution through Langevin dynamics based on gradients of human perceptual evaluations. Our training iterates a process to diffuse real data to cover a wider human-acceptable distribution and can avoid the issues in the HumanGAN training. The evaluation results demonstrate that our HumanDiffusion can successfully represent the human-acceptable distribution without any heuristics for the training.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning
🧭 Keyword Pioneer — perceptual gradient
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio