Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image generation with Shortest Path Diffusion

Ayan Das
November 03, 2023

Image generation with Shortest Path Diffusion

The field of image generation has made significant progress thanks to the introduction of Diffusion Models, which learn to progressively reverse a given image corruption. Recently, a few studies introduced alternative ways of corrupting images in Diffusion Models, with an emphasis on blurring. However, these studies are purely empirical and it remains unclear what is the optimal procedure for corrupting an image. In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. We propose the Fisher metric for the path length, measured in the space of probability distributions. We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring. While the corruption was chosen arbitrarily in previous work, our Shortest Path Diffusion (SPD) determines uniquely the entire spatiotemporal structure of the corruption. We show that SPD improves on strong baselines without any hyperparameter tuning, and outperforms all previous Diffusion Models based on image blurring. Furthermore, any small deviation from the shortest path leads to worse performance, suggesting that SPD provides the optimal procedure to corrupt images. Our work sheds new light on observations made in recent works, and provides a new approach to improve diffusion models on images and other types of data.

Ayan Das

November 03, 2023
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. Image generation with Shortest-path Diffusion Ayan Das*, Stathi Fotiadis*, Anil Batra, Farhang Nabiei, FengTing Liao, Sattar Vakili, Da-Shan Shiu, Alberto Bernacchia (* Equal Contributions) MediaTek Research https://www.mtkresearch.com/
  2. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 2 p Increasingly popular class of Generative Model p Two primary components: – Reverse/Generative process Going from 𝓝(𝟎, 𝑰) to data in distribution space – Forward/Noising process Specifies the exact “path” of travel p Forward specification – By far, dominantly hand designed – Requires trial-and-error to find optimal path Introduction to Diffusion Models
  3. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 3 p Shortest path between two Gaussians p Fisher metric – Keeps maximum overlap between subsequent distributions Shortest path between distributions 𝚺! = 𝚺" #$! , 𝑡 ∈ [0,1] 𝓝 𝟎, 𝚺! → 𝓝(𝟎, 𝑰)
  4. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 4 p Translation invariant p Power spectrum[1] p We model D as p Implementation in Fourier space Modelling the covariance of natural images ~ 𝟏 𝒇𝟐 𝚺! = 𝑭𝑫𝑭" 𝐷%% = &! &"'(# $ [1] Hyvarinen, Huri & Hoyer, Natural Image Statistics (2009) 𝒖! = 𝚿! #/*𝒖" + 𝑰 − 𝚿𝒕 #/*𝝃! 𝚿! = 𝑰 − 𝑫#$!/, 𝑰 − 𝑫 $#
  5. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 5 p Datasets: CIFAR10 (32x32) & ImageNet (64x64) – Representative of “Natural images” – Roughly holds the translation invariant assumption p Setup (for fair comparison) – Same UNet architecture as iDDPM [1] – Same optimizer and learning rate as [1] – Analogous reverse process variance for sampling p Evaluation – Computes FID with 50K samples Experimental setup Only difference: Our estimated non-uniform forward noising schedule 𝚿# [1] Nichol, A. Q. and Dhariwal, P. “Improved denoising diffusion probabilistic model”, ICML 2021
  6. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 6 CIFAR10 results p FID is lowest on the Shortest path – Lowest point is at T = 500 – Surpasses vanilla iDDPM p Our power spectrum model – Found m = 2 to be optimal – Corresponds to “sharpening” rather than “blurring” .. .. as suggested by [1] & [2] [1] Daras, G., Delbracio, M., Talebi, H., Dimakis, A. G., and Milanfar, P. “Soft diffusion: Score matching for general corruptions”, 2022. [2] Hoogeboom, E. and Salimans, T., “Blurring diffusion models”, ICLR 2023
  7. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. 7 p Preliminary experiments are promising – Unconditional model trained (and samples) with T = 1000 – Better FID than iDDPM with less T and training iterations ImageNet64 results Generated samples from SPD (Ours) Quantitative results
  8. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights

    reserved. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. Thank you Read the paper, or checkout our code à mtkresearch/shortest-path-diffusion