diffusion_talk_AD_PRISLAB.pdf

Slide 1

Slide 1 text

Ayan Das PhD Student, University of Surrey DL Intern, MediaTek Research UK Diffusion Models Adv a ncements & Applic a tions Forward Di ff usion Reverse Di ff usion

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Generative Models • Generative Modelling is learning models of the form , given a dataset pθ (X) {X}D i=1 ∼ qdata (X) De f inition, Motiv a tion & Scope … (1) • Motivation 1: Veri fi cation (log-likelihood) • Motivation 2: Generation by sampling, i.e. • Motivation 3: Conditional models of the form Xnew ∼ pθ* (X) pθ (X|Y)

Slide 4

Slide 4 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Generative Models De f inition, Motiv a tion & Scope … (2) • Discriminative Models, i.e. models like • is signi fi cantly simpler • More specialised — focused on , not pθ (Y|X) Y Y X

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, simpli f ied • Gaussian Di ff usion Model generates data by gradual gaussian de-noising • “Reverse process” is the real generative process • “Forward process” is just a way of simulating noisy training data for all t Intuitive Ide a XT Xt Xt−1 X0 ⋯ ⋯ 𝔼 X0 ∼qdata [ 1 T 1 ∑ t=T ||sθ (Xt ) − Xt−1 ||2 2 ] X0 ⋯ ⋯ Xt Xt−1 XT

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, in reality Re a lity is slightly different 𝔼 X0 ∼qdata [ 1 T 1 ∑ t=T ||sθ (X0 + σ[t] ⋅ ϵ, t) − (−ϵ)||2 2 ] , where ϵ ∼ ℕ(0,I) Xt 𝔼 X0 ∼qdata , ϵ∼ℕ(0,I), t∼ 𝕌 [1,T] [ ||sθ (Xt , t) + ϵ||2 2 ] X3 Xt + sθ* (Xt , t) ⋅ δt + δt ⋅ z Xt−1 = Langevin Dynamics !!

Slide 16

Slide 16 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Role of multiple noise scales a chieves different go a ls a t different noise sc a le sθ XT Xt Xt−1 X0 ⋯ ⋯ Uncertain prediction, High variance Certain prediction, Low variance Diversity Fidelity

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Three formalisms SBM, DDPM & SDE De-noising Di ff usion Probabilistic Models (DDPM) Score-Based Models (SBM) Xt−1 = Xt + sθ (Xt , t) ⋅ δt + δt ⋅ z Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t) ) + σt ⋅ z sθ (Xt , t) = − ϵθ (Xt , t) 1 − ¯ αt Stochastic Di ff erential Equations (SDE) dX = [f(X, t) − g2(t)sθ (X, t)] dt + g(t)dw dw ∼ ℕ(0, dt) f(X, t) = 0 g(t) = d dt σ2(t) f(X, t) = − 1 2 β(t)Xdt g(t) = β(t) Xt−1 = 1 αt (Xt +sθ (Xt , t) ⋅ βt) + βt ⋅ z

Slide 20

Slide 20 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) SBM & DDPM: The important difference • SBM only adds noise • DDPM also scales down the data .. in the forw a rd noising process Xt = X0 + σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I) Xt = γ[t] ⋅ X0 + σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I) γ[t] = ¯ αt σ[t] = 1 − ¯ αt Xt−1 = Xt + sθ (Xt , t) ⋅ δt + δt ⋅ z SBM: Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t) ) + σt ⋅ z DDPM:

Slide 21

Slide 21 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) DDPM Summary Forw a rd, Tr a ining a nd Reverse processes Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ 𝔼 X0 ∼qdata , ϵ∼ℕ(0,I), t∼ 𝕌 [1,T] [ ||ϵθ(Xt , t) − ϵ||2 2 ] Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ(Xt , t) ) + σt ⋅ z Sampling from forward process Training the model ϵθ Reverse process sampling with ϵθ*

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models suffer from slow sampling Unlike a ny other gener a tive model Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ* (Xt , t) ) + σt ⋅ z Xt−1 ∼ 𝒩 (μθ* (Xt , t), σt 2 ⋅ I)

Slide 24

Slide 24 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) De-noising Diffusion Implicit Models (DDIM) F a ster a nd Deterministic s a mpling Xt−1 = ¯ αt−1 ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt , t) Xt−1 ∼ 𝒩 (μDDIM θ* (Xt , t), 0) Stochastic Di ff erential Equation (SDE) Ordinary Di ff erential Equation (SDE)

Slide 25

Slide 25 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Skip steps in DDIM S a mpling with shorter diffusion length Xt−1 = ¯ αt−1 ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt , t) Xt Xt−1 Xt−k Xt−k = ¯ αt−k ( Xt − 1 − ¯ αt ⋅ ϵθ* (Xt , t) ¯ αt ) + 1− ¯ αt−k ϵθ* (Xt , t)

Slide 26

Slide 26 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) DDIM as feature extractor Deterministic M a pping Xt = ¯ αt−1 ( Xt−1 − 1 − ¯ αt ⋅ ϵθ* (Xt−1 , t) ¯ αt ) + 1 − ¯ αt−1 ϵθ* (Xt−1 , t) ∀t = 0 → T Feature Extractor Initial Value Problem (IVP) Final Value Problem (FVP)

Slide 27

Slide 27 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Stable Diffusion: Diffusion on latent space • Embed dataset into latent space • Just as before, create di ff usion model • Decode them as • ( , ) are Auto-Encoder X0 ∼ q(X0 ) Z0 = ℰ(X0 ) ZT → ZT−1 → ⋯Z1 → Z0 X0 = 𝒟 (Z0 ) ℰ 𝒟 “High-Resolution Im a ge Synthesis with L a tent Diffusion Models”, Romb a ch et a l., CVPR 2022

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Guidance played an important role • Conditional models are di ff erent — they model conditions explicitly • generates cat images • generates dog images • … so on • Guidance is “in fl uencing the reverse process with condition info” • Using an external classi fi er —> “Classi fi er Guidance” • Using CLIP —> “CLIP guidance” • Using conditional model —> “Classi fi er-free Guidance” X ∼ pθ* (X|Y = CAT) X ∼ pθ* (X|Y = DOG) .. for incre a sing gener a tion qu a lity

Slide 30

Slide 30 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Classi f ier Guidance • Require labels (or some conditioning info) • Train an external classi fi er — completely unrelated to the di ff usion model • Modify the unconditional noise-estimator with the classi fi er to yield a conditional noise-estimator pϕ (Y|X) Guiding the reverse process with extern a l cl a ssi f ier ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt)

Slide 31

Slide 31 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) CLIP Guidance • Guide reverse process with text condition • Instead of classi fi er gradient, maximise dot product of CLIP embeddings C Introduced in the “GLIDE: …” p a per from OpenAI ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt) ̂ ϵθ*,ϕ*(Xt , t, C) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt (ℰI (Xt ) ⋅ ℰT (C))

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Conditioning is straightforward • Expose to the model, i.e. or • Encode into latent code; then or • Other clever ways too .. (next slide) • PS: The “forward di ff usion” does not change; “reverse di ff usion” has a   conditional noise-estimator Y sθ (X, t, Y) ϵθ (X, t, Y) Y sθ (X, t, z = ℰ(Y)) ϵθ (X, t, z = ℰ(Y)) Just like other gener a tive models Xt−1 = 1 αt ( Xt − βt 1 − ¯ αt ⋅ ϵθ (Xt , t, Y) ) + σt ⋅ z

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Super-Resolution” with Conditional Diffusion “Im a ge Super-Resolution vi a Iter a tive Re f inement”, S a h a ri a et a l. Y ∼ pθ (X|Y) Xt−1 ∼ pθ (Xt−1 |Xt , Y) Y

Slide 36

Slide 36 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (1) “Iter a tive L a tent V a ri a ble Re f inement (ILVR)”, Jooyoung Choi et a l. Increasingly de-correlated samples —> Y0 Y1 Yt−1 YT Yt Xt−1 = X′ t−1 − LPFN (X′ t−1 ) + LPFN (Yt−1 ) XT Xt

Slide 37

Slide 37 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (2) “SDEdit: Guided im a ge synthesis …”, Chenlin et a l. Forward di ff use the condition —> Y0 Y1 Yt Xt := Yt Xt−1 ∼ pθ (Xt−1 |Xt )

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Slide 41

Slide 41 text

Copyright (c) 2022 by Ayan Das (@ dasayan05) Molecule : Diffusion on Graphs “Equiv a ri a nt Diffusion for Molecule …”, Hoogeboom et a l., ICML 2022 [Vt , Et] = ¯ αt ⋅ [V0 , E0] + 1 − ¯ αt ⋅ [ϵV , ϵE] [ ̂ ϵV , ̂ ϵE] ← EGNN([Vt , Et], t) N ∑ n=1 V(n) t = 0 Extra requirement for E(3) equivariant graphs

Slide 42

Slide 42 text

X0 := [x(0), x(1), ⋯, x(τ), ⋯] Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ [ϵ(0) t , ϵ(1) t , ⋯, ϵ(τ) t , ⋯] ← BiRNN([X(0) t , X(1) t , ⋯, X(τ) t , ⋯], t)