Slide 1

Slide 1 text

Ayan Das PhD Student, University of Surrey DL Intern, MediaTek Research UK Diffusion Models Adv a ncements & Applic a tions Forward Di ff usion Reverse Di ff usion

Slide 2

Slide 2 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) General Introduction

Slide 3

Slide 3 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Generative Models • Generative Modelling is learning models of the form , given a dataset pθ (X) {X}D i=1 ∼ qdata (X) De f inition, Motiv a tion & Scope … (1) • Motivation 1: Veri fi cation (log-likelihood) • Motivation 2: Generation by sampling, i.e. • Motivation 3: Conditional models of the form Xnew ∼ pθ* (X) pθ (X|Y)

Slide 4

Slide 4 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Generative Models De f inition, Motiv a tion & Scope … (2) • Discriminative Models, i.e. models like • is signi fi cantly simpler • More specialised — focused on , not pθ (Y|X) Y Y X

Slide 5

Slide 5 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diversity vs Fidelity The tr a de-off GANS VAEs

Slide 6

Slide 6 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Any model that can do both equally well ? .. or m a ybe control the tr a de-off

Slide 7

Slide 7 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Di ff usion Models

Slide 8

Slide 8 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Other Generative Models C a ndid a tes: VAE, GAN, NF

Slide 9

Slide 9 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models are different Wh a t m a kes it h a rd to work with ? Non-deterministic mapping

Slide 10

Slide 10 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, simpli f ied • Gaussian Di ff usion Model generates data by gradual gaussian de-noising • “Reverse process” is the real generative process • “Forward process” is just a way of simulating noisy training data for all t Intuitive Ide a XT Xt Xt−1 X0 ⋯ ⋯ 𝔼 X0 ∼qdata [ 1 T 1 ∑ t=T ||sθ (Xt ) − Xt−1 ||2 2 ] X0 ⋯ ⋯ Xt Xt−1 XT

Slide 11

Slide 11 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Forward-Reverse process is equivalent to VAE-like Encoder-Decoder” WRONG

Slide 12

Slide 12 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Forward process is “parallelizable” X0 ⋯ ⋯ Xt Xt−1 XT X0 ⋯ ⋯ Xt Xt−1 XT X1 Xt = X0 + σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I) σ[t]

Slide 13

Slide 13 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, simpli f ied Visu a lising the d a t a sp a ce Vector Field that guides towards real data

Slide 14

Slide 14 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) ∇X log qdata (X) ≈ sθ (X, ⋅ ) “Score” of a Distribution .. a n import a nt st a tistic a l qu a ntity X0 +∇X log qdata (X)| X=X0 X1 ←

Slide 15

Slide 15 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, in reality Re a lity is slightly different 𝔼 X0 ∼qdata [ 1 T 1 ∑ t=T ||sθ (X0 + σ[t] ⋅ ϵ, t) − (−ϵ)||2 2 ] , where ϵ ∼ ℕ(0,I) Xt 𝔼 X0 ∼qdata , ϵ∼ℕ(0,I), t∼ 𝕌 [1,T] [ ||sθ (Xt , t) + ϵ||2 2 ] X3 Xt + sθ* (Xt , t) ⋅ δt + δt ⋅ z Xt−1 = Langevin Dynamics !!

Slide 16

Slide 16 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Role of multiple noise scales a chieves different go a ls a t different noise sc a le sθ XT Xt Xt−1 X0 ⋯ ⋯ Uncertain prediction, High variance Certain prediction, Low variance Diversity Fidelity

Slide 17

Slide 17 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Origin Story & Formalisms

Slide 18

Slide 18 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Tracing Diffusion Model back into history .. where did it st a rt ? SOTA on CIFAR10 FID 3.14

Slide 19

Slide 19 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Three formalisms SBM, DDPM & SDE De-noising Di ff usion Probabilistic Models (DDPM) Score-Based Models (SBM) Xt−1 = Xt + sθ (Xt , t) ⋅ δt + δt ⋅ z Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t) ) + σt ⋅ z sθ (Xt , t) = − ϵθ (Xt , t) 1 − ¯ αt Stochastic Di ff erential Equations (SDE) dX = [f(X, t) − g2(t)sθ (X, t)] dt + g(t)dw dw ∼ ℕ(0, dt) f(X, t) = 0 g(t) = d dt σ2(t) f(X, t) = − 1 2 β(t)Xdt g(t) = β(t) Xt−1 = 1 αt (Xt +sθ (Xt , t) ⋅ βt) + βt ⋅ z

Slide 20

Slide 20 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) SBM & DDPM: The important difference • SBM only adds noise • DDPM also scales down the data .. in the forw a rd noising process Xt = X0 + σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I) Xt = γ[t] ⋅ X0 + σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I) γ[t] = ¯ αt σ[t] = 1 − ¯ αt Xt−1 = Xt + sθ (Xt , t) ⋅ δt + δt ⋅ z SBM: Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t) ) + σt ⋅ z DDPM:

Slide 21

Slide 21 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) DDPM Summary Forw a rd, Tr a ining a nd Reverse processes Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ 𝔼 X0 ∼qdata , ϵ∼ℕ(0,I), t∼ 𝕌 [1,T] [ ||ϵθ(Xt , t) − ϵ||2 2 ] Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ(Xt , t) ) + σt ⋅ z Sampling from forward process Training the model ϵθ Reverse process sampling with ϵθ*

Slide 22

Slide 22 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Recent Advancements Faster Sampling

Slide 23

Slide 23 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models suffer from slow sampling Unlike a ny other gener a tive model Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ* (Xt , t) ) + σt ⋅ z Xt−1 ∼ 𝒩 (μθ* (Xt , t), σt 2 ⋅ I)

Slide 24

Slide 24 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) De-noising Diffusion Implicit Models (DDIM) F a ster a nd Deterministic s a mpling Xt−1 = ¯ αt−1 ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt , t) Xt−1 ∼ 𝒩 (μDDIM θ* (Xt , t), 0) Stochastic Di ff erential Equation (SDE) Ordinary Di ff erential Equation (SDE)

Slide 25

Slide 25 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Skip steps in DDIM S a mpling with shorter diffusion length Xt−1 = ¯ αt−1 ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt , t) Xt Xt−1 Xt−k Xt−k = ¯ αt−k ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1− ¯ αt−k ϵθ* (Xt , t)

Slide 26

Slide 26 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) DDIM as feature extractor Deterministic M a pping Xt = ¯ αt−1 ( Xt−1 − 1 − ¯ αt ⋅ ϵθ* (Xt−1 , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt−1 , t) ∀t = 0 → T Feature Extractor Initial Value Problem (IVP) Final Value Problem (FVP)

Slide 27

Slide 27 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Stable Diffusion: Diffusion on latent space • Embed dataset into latent space • Just as before, create di ff usion model • Decode them as • ( , ) are Auto-Encoder X0 ∼ q(X0 ) Z0 = ℰ(X0 ) ZT → ZT−1 → ⋯Z1 → Z0 X0 = 𝒟 (Z0 ) ℰ 𝒟 “High-Resolution Im a ge Synthesis with L a tent Diffusion Models”, Romb a ch et a l., CVPR 2022

Slide 28

Slide 28 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Recent Advancements Guidance

Slide 29

Slide 29 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Guidance played an important role • Conditional models are di ff erent — they model conditions explicitly • generates cat images • generates dog images • … so on • Guidance is “in fl uencing the reverse process with condition info” • Using an external classi fi er —> “Classi fi er Guidance” • Using CLIP —> “CLIP guidance” • Using conditional model —> “Classi fi er-free Guidance” X ∼ pθ* (X|Y = CAT) X ∼ pθ* (X|Y = DOG) .. for incre a sing gener a tion qu a lity

Slide 30

Slide 30 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Classi f ier Guidance • Require labels (or some conditioning info) • Train an external classi fi er — completely unrelated to the di ff usion model • Modify the unconditional noise-estimator with the classi fi er to yield a conditional noise-estimator pϕ (Y|X) Guiding the reverse process with extern a l cl a ssi f ier ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt)

Slide 31

Slide 31 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) CLIP Guidance • Guide reverse process with text condition • Instead of classi fi er gradient, maximise dot product of CLIP embeddings C Introduced in the “GLIDE: …” p a per from OpenAI ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt) ̂ ϵθ*,ϕ*(Xt , t, C) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt (ℰI (Xt ) ⋅ ℰT (C))

Slide 32

Slide 32 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Recent Application Conditional Models

Slide 33

Slide 33 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Conditioning is straightforward • Expose to the model, i.e. or • Encode into latent code; then or • Other clever ways too .. (next slide) • PS: The “forward di ff usion” does not change; “reverse di ff usion” has a 
 conditional noise-estimator Y sθ (X, t, Y) ϵθ (X, t, Y) Y sθ (X, t, z = ℰ(Y)) ϵθ (X, t, z = ℰ(Y)) Just like other gener a tive models Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t, Y) ) + σt ⋅ z

Slide 34

Slide 34 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Text-Conditioning The impressive DALLE-2, Im a gen & more

Slide 35

Slide 35 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Super-Resolution” with Conditional Diffusion “Im a ge Super-Resolution vi a Iter a tive Re f inement”, S a h a ri a et a l. Y ∼ pθ (X|Y) Xt−1 ∼ pθ (Xt−1 |Xt , Y) Y

Slide 36

Slide 36 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (1) “Iter a tive L a tent V a ri a ble Re f inement (ILVR)”, Jooyoung Choi et a l. Increasingly de-correlated samples —> Y0 Y1 Yt−1 YT Yt Xt−1 = X′  t−1 − LPFN (X′  t−1 ) + LPFN (Yt−1 ) XT Xt

Slide 37

Slide 37 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (2) “SDEdit: Guided im a ge synthesis …”, Chenlin et a l. Forward di ff use the condition —> Y0 Y1 Yt Xt := Yt Xt−1 ∼ pθ (Xt−1 |Xt )

Slide 38

Slide 38 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (3) “ReP a int: Inp a inting using DDPM”, Lugm a yr et a l., CVPR 22

Slide 39

Slide 39 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Di ff usion Models .. for other data modalities

Slide 40

Slide 40 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Forward-Reverse process is quite generic Do not a ssume the structure of the d a t a a nd/or model Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ ̂ ϵ ← ϵθ(Xt , t)

Slide 41

Slide 41 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Molecule : Diffusion on Graphs “Equiv a ri a nt Diffusion for Molecule …”, Hoogeboom et a l., ICML 2022 [Vt , Et] = ¯ αt ⋅ [V0 , E0] + 1 − ¯ αt ⋅ [ϵV , ϵE] [ ̂ ϵV , ̂ ϵE] ← EGNN([Vt , Et], t) N ∑ n=1 V(n) t = 0 Extra requirement for E(3) equivariant graphs

Slide 42

Slide 42 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion on continuous sequences (1) My l a test work;

X0 := [x(0), x(1), ⋯, x(τ), ⋯] Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ [ϵ(0) t , ϵ(1) t , ⋯, ϵ(τ) t , ⋯] ← BiRNN([X(0) t , X(1) t , ⋯, X(τ) t , ⋯], t)

Slide 43

Slide 43 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion on continuous sequences (2) Qu a lit a tive results of uncondition a l gener a tion

Slide 44

Slide 44 text

Questions ? @d a s a y a n05 https:// a y a nd a s.me/ a .d a s@surrey. a c.uk