Maximum Likelihood Training of Score-Based Diffusion Models

Yang Song; Conor Durkan; Iain Murray; Stefano Ermon

2021 NIPS NeurIPS 2021

Maximum Likelihood Training of Score-Based Diffusion Models

Abstract

Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet $32\times 32$ without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — score-based diffusion

🐣 Hot Topic Early Bird — score matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Song , Conor Durkan , Iain Murray , Stefano Ermon

Topics

Machine Learning > Optimization & Theory > Statistical Learning Deep Learning > Models > Diffusion Models Deep Learning > Models > Generative Models

Keywords

maximum likelihood score matching generative model score-based diffusion continuous normalizing flow

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021