Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models

Ajay Jain*
UC Berkeley, Google Research
Ben Poole*
Google Research
Paper Colab notebook Code

Abstract

Diffusion and score-based generative models have achieved remarkable sample quality on difficult image synthesis tasks. Many works have proposed samplers for pretrained diffusion models, including ancestral samplers, SDE and ODE integrators and annealed MCMC approaches. So far, the best sample quality has been achieved with samplers that use time-conditional score functions and move between several noise levels. However, estimating an accurate score function at many noise levels can be challenging and requires an architecture that is more expressive than would be needed for a single noise level. In this work, we explore MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality on the CIFAR-10 dataset. We show that while näive application of Langevin dynamics and a related noise-denoise sampler produces poor samples, methods built on integrators of underdamped Langevin dynamics using splitting methods can perform well. Further, by combining MCMC methods with existing multiscale samplers, we begin to approach competitive sample quality without using scores at large noise levels.


Text-to-video with our sampler

Our sampler generates a large diversity of images in a single run. These videos can be generated on one consumer-grade GPU in around 5 minutes. Unlike past work, we also only use a single noise level of the pretrained diffusion model, Stable Diffusion v2. Hover to pause, and drag the slider to change video speed.

a DSLR photo of a large basket of rainbow macarons.
clocks
a DSLR photo of a gorgeous sunset
a cartoon rabbit dancing
a racecar speeding down the track
a 3D rendering of a forest temple
a massive waterfall
a post-apocalyptic city full of forest overgrowth
a DSLR photo of a metronome ticking
shooting stars
fruit salad
a shiba inu
an espresso machine
A bowl of Pho
a bunch of colorful candies falling into a tray
Tiny plant sprout coming out of the ground
coffee pouring into a cup

Why use the BAOAB-limit sampler?

We propose two samplers for diffusion models: the noise-denoise sampler and the infinite friction limit of the BAOAB sampler, then show how a special case of the noise-denoise sampler makes the two equivalent. BAOAB-limit is an MCMC sampler is sometimes used in statistical mechanics work.

Typically, diffusion models are sampled in a coarse-to-fine manner, first from a smoothed score at a high noise level and gradually at lower noise levels to approach the desired distribution. However, the loss function used to train diffusion models, denoising score matching, was originally proposed to estimate scores only at a single noise level, which is theoretically enough estimate the distribution. Our samplers are interesting for diffusion models because they allow good mixing between modes of the score function at a single noise level. This could simplify training.

They also have great diversity within a chain. That's useful because a user could generate a large batch of images or a video with a single run, rather than running many sampling chains in parallel. However, image quality is often slightly worse than alternative samplers.

Our samplers generate images with high diversity within each MCMC chain.
Our samplers generate images with high diversity within each MCMC chain.


Citation

@article{jain2022journey,
  author = {Jain, Ajay and Poole, Ben},
  title = {Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models},
  journal = {Workshop on Score-Based Methods at NeurIPS},
  year = {2022},
}