Ayan Das
PhD Student, University of Surrey
DL Intern, MediaTek Research UK
Diffusion Models
Adv
a
ncements & Applic
a
tions
Forward Di
ff
usion
Reverse Di
ff
usion
Slide 2
Slide 2 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
General Introduction
Slide 3
Slide 3 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Generative Models
• Generative Modelling is learning models of the form , given a dataset
pθ
(X)
{X}D
i=1
∼ qdata
(X)
De
f
inition, Motiv
a
tion & Scope … (1)
• Motivation 1: Veri
fi
cation (log-likelihood)
• Motivation 2: Generation by sampling, i.e.
• Motivation 3: Conditional models of the form
Xnew
∼ pθ*
(X)
pθ
(X|Y)
Slide 4
Slide 4 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Generative Models
De
f
inition, Motiv
a
tion & Scope … (2)
• Discriminative Models, i.e. models like
• is signi
fi
cantly simpler
• More specialised — focused on , not
pθ
(Y|X)
Y
Y X
Slide 5
Slide 5 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diversity vs Fidelity
The tr
a
de-off
GANS
VAEs
Slide 6
Slide 6 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Any model that can do both equally well ?
.. or m
a
ybe control the tr
a
de-off
Slide 7
Slide 7 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Di
ff
usion Models
Slide 8
Slide 8 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Other Generative Models
C
a
ndid
a
tes: VAE, GAN, NF
Slide 9
Slide 9 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion Models are different
Wh
a
t m
a
kes it h
a
rd to work with ?
Non-deterministic mapping
Slide 10
Slide 10 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion Models, simpli
f
ied
• Gaussian Di
ff
usion Model generates data by gradual gaussian de-noising
• “Reverse process” is the real generative process
• “Forward process” is just a way of simulating noisy training data for all t
Intuitive Ide
a
XT
Xt
Xt−1
X0
⋯ ⋯
𝔼
X0
∼qdata [
1
T
1
∑
t=T
||sθ
(Xt
) − Xt−1
||2
2 ]
X0
⋯ ⋯
Xt
Xt−1
XT
Slide 11
Slide 11 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
“Forward-Reverse process is equivalent to VAE-like Encoder-Decoder”
WRONG
Slide 12
Slide 12 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Forward process is “parallelizable”
X0
⋯ ⋯
Xt
Xt−1
XT
X0
⋯ ⋯
Xt
Xt−1
XT
X1
Xt
= X0
+ σ[t] ⋅ ϵ, where ϵ ∼ ℕ(0, I)
σ[t]
Slide 13
Slide 13 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion Models, simpli
f
ied
Visu
a
lising the d
a
t
a
sp
a
ce
Vector Field that guides towards real data
Slide 14
Slide 14 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
∇X
log qdata
(X) ≈ sθ
(X, ⋅ )
“Score” of a Distribution
..
a
n import
a
nt st
a
tistic
a
l qu
a
ntity
X0
+∇X
log qdata
(X)|
X=X0
X1
←
Slide 15
Slide 15 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion Models, in reality
Re
a
lity is slightly different
𝔼
X0
∼qdata [
1
T
1
∑
t=T
||sθ
(X0
+ σ[t] ⋅ ϵ, t) − (−ϵ)||2
2 ]
, where ϵ ∼ ℕ(0,I)
Xt
𝔼
X0
∼qdata
, ϵ∼ℕ(0,I), t∼
𝕌
[1,T] [
||sθ
(Xt
, t) + ϵ||2
2 ]
X3
Xt
+ sθ*
(Xt
, t) ⋅ δt
+ δt
⋅ z
Xt−1
=
Langevin Dynamics !!
Slide 16
Slide 16 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Role of multiple noise scales
a
chieves different go
a
ls
a
t different noise sc
a
le
sθ
XT
Xt
Xt−1
X0
⋯ ⋯
Uncertain prediction, High variance Certain prediction, Low variance
Diversity Fidelity
Slide 17
Slide 17 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Origin Story
&
Formalisms
Slide 18
Slide 18 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Tracing Diffusion Model back into history
.. where did it st
a
rt ?
SOTA on CIFAR10
FID 3.14
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
DDIM as feature extractor
Deterministic M
a
pping
Xt
= ¯
αt−1 (
Xt−1
− 1 − ¯
αt
⋅ ϵθ*
(Xt−1
, t)
¯
αt
)
+ 1 − ¯
αt−1
ϵθ*
(Xt−1
, t)
∀t = 0 → T
Feature
Extractor
Initial Value Problem (IVP)
Final Value Problem (FVP)
Slide 27
Slide 27 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Stable Diffusion: Diffusion on latent space
• Embed dataset into latent space
• Just as before, create di
ff
usion model
• Decode them as
• ( , ) are Auto-Encoder
X0
∼ q(X0
) Z0
= ℰ(X0
)
ZT
→ ZT−1
→ ⋯Z1
→ Z0
X0
=
𝒟
(Z0
)
ℰ
𝒟
“High-Resolution Im
a
ge Synthesis with L
a
tent Diffusion Models”, Romb
a
ch et
a
l., CVPR 2022
Slide 28
Slide 28 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Recent Advancements
Guidance
Slide 29
Slide 29 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Guidance played an important role
• Conditional models are di
ff
erent — they model conditions explicitly
• generates cat images
• generates dog images
• … so on
• Guidance is “in
fl
uencing the reverse process with condition info”
• Using an external classi
fi
er —> “Classi
fi
er Guidance”
• Using CLIP —> “CLIP guidance”
• Using conditional model —> “Classi
fi
er-free Guidance”
X ∼ pθ*
(X|Y = CAT)
X ∼ pθ*
(X|Y = DOG)
.. for incre
a
sing gener
a
tion qu
a
lity
Slide 30
Slide 30 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Classi
f
ier Guidance
• Require labels (or some conditioning info)
• Train an external classi
fi
er — completely unrelated to the di
ff
usion model
• Modify the unconditional noise-estimator with the classi
fi
er to yield a conditional noise-estimator
pϕ
(Y|X)
Guiding the reverse process with extern
a
l cl
a
ssi
f
ier
̂
ϵθ*,ϕ*(Xt
, t, Y) = ϵθ*(Xt
, t)−λ ⋅ σ[t]∇Xt
log pϕ*(Y|Xt)
Slide 31
Slide 31 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
CLIP Guidance
• Guide reverse process with text condition
• Instead of classi
fi
er gradient, maximise dot product of CLIP embeddings
C
Introduced in the “GLIDE: …” p
a
per from OpenAI
̂
ϵθ*,ϕ*(Xt
, t, Y) = ϵθ*(Xt
, t)−λ ⋅ σ[t]∇Xt
log pϕ*(Y|Xt)
̂
ϵθ*,ϕ*(Xt
, t, C) = ϵθ*(Xt
, t)−λ ⋅ σ[t]∇Xt
(ℰI
(Xt
) ⋅ ℰT
(C))
Slide 32
Slide 32 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Recent Application
Conditional Models
Slide 33
Slide 33 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Conditioning is straightforward
• Expose to the model, i.e. or
• Encode into latent code; then or
• Other clever ways too .. (next slide)
• PS: The “forward di
ff
usion” does not change; “reverse di
ff
usion” has a
conditional noise-estimator
Y sθ
(X, t, Y) ϵθ
(X, t, Y)
Y sθ
(X, t, z = ℰ(Y)) ϵθ
(X, t, z = ℰ(Y))
Just like other gener
a
tive models
Xt−1
=
1
αt
(
Xt
−
βt
1 − ¯
αt
⋅ ϵθ
(Xt
, t, Y)
)
+ σt
⋅ z
Slide 34
Slide 34 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Text-Conditioning
The impressive DALLE-2, Im
a
gen & more
Slide 35
Slide 35 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
“Super-Resolution” with Conditional Diffusion
“Im
a
ge Super-Resolution vi
a
Iter
a
tive Re
f
inement”, S
a
h
a
ri
a
et
a
l.
Y
∼ pθ
(X|Y)
Xt−1
∼ pθ
(Xt−1
|Xt
, Y)
Y
Slide 36
Slide 36 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
“Tweaking” the sampling process (1)
“Iter
a
tive L
a
tent V
a
ri
a
ble Re
f
inement (ILVR)”, Jooyoung Choi et
a
l.
Increasingly de-correlated samples —>
Y0
Y1
Yt−1
YT
Yt
Xt−1
= X′

t−1
− LPFN
(X′

t−1
) + LPFN
(Yt−1
)
XT
Xt
Slide 37
Slide 37 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
“Tweaking” the sampling process (2)
“SDEdit: Guided im
a
ge synthesis …”, Chenlin et
a
l.
Forward di
ff
use the condition —>
Y0
Y1 Yt
Xt
:= Yt Xt−1
∼ pθ
(Xt−1
|Xt
)
Slide 38
Slide 38 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
“Tweaking” the sampling process (3)
“ReP
a
int: Inp
a
inting using DDPM”, Lugm
a
yr et
a
l., CVPR 22
Slide 39
Slide 39 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Di
ff
usion Models
.. for other data modalities
Slide 40
Slide 40 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Forward-Reverse process is quite generic
Do not
a
ssume the structure of the d
a
t
a a
nd/or model
Xt
= ¯
αt
⋅ X0
+ 1 − ¯
αt
⋅ ϵ
̂
ϵ ← ϵθ(Xt
, t)
Slide 41
Slide 41 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Molecule : Diffusion on Graphs
“Equiv
a
ri
a
nt Diffusion for Molecule …”, Hoogeboom et
a
l., ICML 2022
[Vt
, Et] = ¯
αt
⋅ [V0
, E0] + 1 − ¯
αt
⋅ [ϵV
, ϵE]
[ ̂
ϵV
, ̂
ϵE] ← EGNN([Vt
, Et], t)
N
∑
n=1
V(n)
t
= 0
Extra requirement for
E(3) equivariant graphs
Slide 42
Slide 42 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion on continuous sequences (1)
My l
a
test work;
X0
:= [x(0), x(1), ⋯, x(τ), ⋯]
Xt
= ¯
αt
⋅ X0
+ 1 − ¯
αt
⋅ ϵ
[ϵ(0)
t
, ϵ(1)
t
, ⋯, ϵ(τ)
t
, ⋯] ← BiRNN([X(0)
t
, X(1)
t
, ⋯, X(τ)
t
, ⋯], t)
Slide 43
Slide 43 text
Copyright (c) 2022 by Ayan
Das (@
dasayan05)
Diffusion on continuous sequences (2)
Qu
a
lit
a
tive results of uncondition
a
l gener
a
tion
Slide 44
Slide 44 text
Questions ?
@d
a
s
a
y
a
n05
https://
a
y
a
nd
a
s.me/
a
.d
a
s@surrey.
a
c.uk