Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image generation with Shortest Path Diffusion

Ayan Das
November 03, 2023

Image generation with Shortest Path Diffusion

The field of image generation has made significant progress thanks to the introduction of Diffusion Models, which learn to progressively reverse a given image corruption. Recently, a few studies introduced alternative ways of corrupting images in Diffusion Models, with an emphasis on blurring. However, these studies are purely empirical and it remains unclear what is the optimal procedure for corrupting an image. In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. We propose the Fisher metric for the path length, measured in the space of probability distributions. We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring. While the corruption was chosen arbitrarily in previous work, our Shortest Path Diffusion (SPD) determines uniquely the entire spatiotemporal structure of the corruption. We show that SPD improves on strong baselines without any hyperparameter tuning, and outperforms all previous Diffusion Models based on image blurring. Furthermore, any small deviation from the shortest path leads to worse performance, suggesting that SPD provides the optimal procedure to corrupt images. Our work sheds new light on observations made in recent works, and provides a new approach to improve diffusion models on images and other types of data.

Ayan Das

November 03, 2023
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved.
    MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved.
    Image generation with Shortest-path Diffusion
    Ayan Das*, Stathi Fotiadis*, Anil Batra, Farhang Nabiei, FengTing Liao, Sattar Vakili,
    Da-Shan Shiu, Alberto Bernacchia
    (* Equal Contributions)
    MediaTek Research
    https://www.mtkresearch.com/

    View full-size slide

  2. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 2
    p Increasingly popular class of Generative Model
    p Two primary components:
    – Reverse/Generative process
    Going from 𝓝(𝟎, 𝑰) to data in distribution space
    – Forward/Noising process
    Specifies the exact “path” of travel
    p Forward specification
    – By far, dominantly hand designed
    – Requires trial-and-error to find optimal path
    Introduction to Diffusion Models

    View full-size slide

  3. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 3
    p Shortest path between two Gaussians
    p Fisher metric
    – Keeps maximum overlap between subsequent distributions
    Shortest path between distributions
    𝚺!
    = 𝚺"
    #$!
    , 𝑡 ∈ [0,1]
    𝓝 𝟎, 𝚺! → 𝓝(𝟎, 𝑰)

    View full-size slide

  4. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 4
    p Translation invariant
    p Power spectrum[1]
    p We model D as
    p Implementation in Fourier space
    Modelling the covariance of natural images
    ~
    𝟏
    𝒇𝟐
    𝚺! = 𝑭𝑫𝑭"
    𝐷%%
    = &!
    &"'(#
    $
    [1] Hyvarinen, Huri & Hoyer, Natural Image Statistics (2009)
    𝒖!
    = 𝚿!
    #/*𝒖"
    + 𝑰 − 𝚿𝒕
    #/*𝝃!
    𝚿!
    = 𝑰 − 𝑫#$!/, 𝑰 − 𝑫 $#

    View full-size slide

  5. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 5
    p Datasets:
    CIFAR10 (32x32) & ImageNet (64x64)
    – Representative of “Natural images”
    – Roughly holds the translation invariant assumption
    p Setup (for fair comparison)
    – Same UNet architecture as iDDPM [1]
    – Same optimizer and learning rate as [1]
    – Analogous reverse process variance for sampling
    p Evaluation
    – Computes FID with 50K samples
    Experimental setup
    Only difference:
    Our estimated non-uniform
    forward noising schedule 𝚿#
    [1] Nichol, A. Q. and Dhariwal, P. “Improved denoising diffusion probabilistic model”, ICML 2021

    View full-size slide

  6. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 6
    CIFAR10 results
    p FID is lowest on the Shortest path
    – Lowest point is at T = 500
    – Surpasses vanilla iDDPM
    p Our power spectrum model
    – Found m = 2 to be optimal
    – Corresponds to “sharpening” rather than “blurring” ..
    .. as suggested by [1] & [2]
    [1] Daras, G., Delbracio, M., Talebi, H., Dimakis, A. G., and Milanfar, P. “Soft diffusion: Score matching for general corruptions”, 2022.
    [2] Hoogeboom, E. and Salimans, T., “Blurring diffusion models”, ICLR 2023

    View full-size slide

  7. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 7
    p Preliminary experiments are promising
    – Unconditional model trained (and samples) with T = 1000
    – Better FID than iDDPM with less T and training iterations
    ImageNet64 results
    Generated samples from SPD (Ours)
    Quantitative results

    View full-size slide

  8. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved.
    MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved.
    Thank you
    Read the paper, or checkout our code à
    mtkresearch/shortest-path-diffusion

    View full-size slide