Slide 1

Slide 1 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. Image generation with Shortest-path Diffusion Ayan Das*, Stathi Fotiadis*, Anil Batra, Farhang Nabiei, FengTing Liao, Sattar Vakili, Da-Shan Shiu, Alberto Bernacchia (* Equal Contributions) MediaTek Research https://www.mtkresearch.com/

Slide 2

Slide 2 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 2 p Increasingly popular class of Generative Model p Two primary components: โ€“ Reverse/Generative process Going from ๐“(๐ŸŽ, ๐‘ฐ) to data in distribution space โ€“ Forward/Noising process Specifies the exact โ€œpathโ€ of travel p Forward specification โ€“ By far, dominantly hand designed โ€“ Requires trial-and-error to find optimal path Introduction to Diffusion Models

Slide 3

Slide 3 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 3 p Shortest path between two Gaussians p Fisher metric โ€“ Keeps maximum overlap between subsequent distributions Shortest path between distributions ๐šบ! = ๐šบ" #$! , ๐‘ก โˆˆ [0,1] ๐“ ๐ŸŽ, ๐šบ! โ†’ ๐“(๐ŸŽ, ๐‘ฐ)

Slide 4

Slide 4 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 4 p Translation invariant p Power spectrum[1] p We model D as p Implementation in Fourier space Modelling the covariance of natural images ~ ๐Ÿ ๐’‡๐Ÿ ๐šบ! = ๐‘ญ๐‘ซ๐‘ญ" ๐ท%% = &! &"'(# $ [1] Hyvarinen, Huri & Hoyer, Natural Image Statistics (2009) ๐’–! = ๐šฟ! #/*๐’–" + ๐‘ฐ โˆ’ ๐šฟ๐’• #/*๐ƒ! ๐šฟ! = ๐‘ฐ โˆ’ ๐‘ซ#$!/, ๐‘ฐ โˆ’ ๐‘ซ $#

Slide 5

Slide 5 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 5 p Datasets: CIFAR10 (32x32) & ImageNet (64x64) โ€“ Representative of โ€œNatural imagesโ€ โ€“ Roughly holds the translation invariant assumption p Setup (for fair comparison) โ€“ Same UNet architecture as iDDPM [1] โ€“ Same optimizer and learning rate as [1] โ€“ Analogous reverse process variance for sampling p Evaluation โ€“ Computes FID with 50K samples Experimental setup Only difference: Our estimated non-uniform forward noising schedule ๐šฟ# [1] Nichol, A. Q. and Dhariwal, P. โ€œImproved denoising diffusion probabilistic modelโ€, ICML 2021

Slide 6

Slide 6 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 6 CIFAR10 results p FID is lowest on the Shortest path โ€“ Lowest point is at T = 500 โ€“ Surpasses vanilla iDDPM p Our power spectrum model โ€“ Found m = 2 to be optimal โ€“ Corresponds to โ€œsharpeningโ€ rather than โ€œblurringโ€ .. .. as suggested by [1] & [2] [1] Daras, G., Delbracio, M., Talebi, H., Dimakis, A. G., and Milanfar, P. โ€œSoft diffusion: Score matching for general corruptionsโ€, 2022. [2] Hoogeboom, E. and Salimans, T., โ€œBlurring diffusion modelsโ€, ICLR 2023

Slide 7

Slide 7 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. 7 p Preliminary experiments are promising โ€“ Unconditional model trained (and samples) with T = 1000 โ€“ Better FID than iDDPM with less T and training iterations ImageNet64 results Generated samples from SPD (Ours) Quantitative results

Slide 8

Slide 8 text

MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. MediaTek Proprietary and Confidential. ยฉ 2022 MediaTek Inc. All rights reserved. Thank you Read the paper, or checkout our code ร  mtkresearch/shortest-path-diffusion