Image generation with Shortest Path Diffusion

Slide 1

Slide 1 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. Image generation with Shortest-path Diffusion Ayan Das*, Stathi Fotiadis*, Anil Batra, Farhang Nabiei, FengTing Liao, Sattar Vakili, Da-Shan Shiu, Alberto Bernacchia (* Equal Contributions) MediaTek Research https://www.mtkresearch.com/

Slide 2

Slide 2 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 2 p Increasingly popular class of Generative Model p Two primary components: – Reverse/Generative process Going from 𝓝(𝟎, 𝑰) to data in distribution space – Forward/Noising process Specifies the exact “path” of travel p Forward specification – By far, dominantly hand designed – Requires trial-and-error to find optimal path Introduction to Diffusion Models

Slide 3

Slide 3 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 3 p Shortest path between two Gaussians p Fisher metric – Keeps maximum overlap between subsequent distributions Shortest path between distributions 𝚺! = 𝚺" #$! , 𝑡 ∈ [0,1] 𝓝 𝟎, 𝚺! → 𝓝(𝟎, 𝑰)

Slide 4

Slide 4 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 4 p Translation invariant p Power spectrum[1] p We model D as p Implementation in Fourier space Modelling the covariance of natural images ~ 𝟏 𝒇𝟐 𝚺! = 𝑭𝑫𝑭" 𝐷%% = &! &"'(# $ [1] Hyvarinen, Huri & Hoyer, Natural Image Statistics (2009) 𝒖! = 𝚿! #/*𝒖" + 𝑰 − 𝚿𝒕 #/*𝝃! 𝚿! = 𝑰 − 𝑫#$!/, 𝑰 − 𝑫 $#

Slide 5

Slide 5 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 5 p Datasets: CIFAR10 (32x32) & ImageNet (64x64) – Representative of “Natural images” – Roughly holds the translation invariant assumption p Setup (for fair comparison) – Same UNet architecture as iDDPM [1] – Same optimizer and learning rate as [1] – Analogous reverse process variance for sampling p Evaluation – Computes FID with 50K samples Experimental setup Only difference: Our estimated non-uniform forward noising schedule 𝚿# [1] Nichol, A. Q. and Dhariwal, P. “Improved denoising diffusion probabilistic model”, ICML 2021

Slide 6

Slide 6 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 6 CIFAR10 results p FID is lowest on the Shortest path – Lowest point is at T = 500 – Surpasses vanilla iDDPM p Our power spectrum model – Found m = 2 to be optimal – Corresponds to “sharpening” rather than “blurring” .. .. as suggested by [1] & [2] [1] Daras, G., Delbracio, M., Talebi, H., Dimakis, A. G., and Milanfar, P. “Soft diffusion: Score matching for general corruptions”, 2022. [2] Hoogeboom, E. and Salimans, T., “Blurring diffusion models”, ICLR 2023

Slide 7

Slide 7 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. 7 p Preliminary experiments are promising – Unconditional model trained (and samples) with T = 1000 – Better FID than iDDPM with less T and training iterations ImageNet64 results Generated samples from SPD (Ours) Quantitative results

Slide 8

Slide 8 text

MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. MediaTek Proprietary and Confidential. © 2022 MediaTek Inc. All rights reserved. Thank you Read the paper, or checkout our code à mtkresearch/shortest-path-diffusion