Copyright (c) 2022 by Ayan Das (@ dasayan05) Generative Models • Generative Modelling is learning models of the form , given a dataset pθ (X) {X}D i=1 ∼ qdata (X) De f inition, Motiv a tion & Scope … (1) • Motivation 1: Veri fi cation (log-likelihood)
• Motivation 2: Generation by sampling, i.e.
• Motivation 3: Conditional models of the form Xnew ∼ pθ* (X) pθ (X|Y)
Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, simpli f ied • Gaussian Di ff usion Model generates data by gradual gaussian de-noising
• “Reverse process” is the real generative process
• “Forward process” is just a way of simulating noisy training data for all t Intuitive Ide a XT Xt Xt−1 X0 ⋯ ⋯ 𝔼 X0 ∼qdata [ 1 T 1 ∑ t=T ||sθ (Xt ) − Xt−1 ||2 2 ] X0 ⋯ ⋯ Xt Xt−1 XT
Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion Models, simpli f ied Visu a lising the d a t a sp a ce Vector Field that guides towards real data
Copyright (c) 2022 by Ayan Das (@ dasayan05) ∇X log qdata (X) ≈ sθ (X, ⋅ ) “Score” of a Distribution .. a n import a nt st a tistic a l qu a ntity X0 +∇X log qdata (X)| X=X0 X1 ←
Copyright (c) 2022 by Ayan Das (@ dasayan05) Role of multiple noise scales a chieves different go a ls a t different noise sc a le sθ XT Xt Xt−1 X0 ⋯ ⋯ Uncertain prediction, High variance Certain prediction, Low variance Diversity Fidelity
Copyright (c) 2022 by Ayan Das (@ dasayan05) Stable Diffusion: Diffusion on latent space • Embed dataset into latent space
• Just as before, create di ff usion model
• Decode them as
• ( , ) are Auto-Encoder X0 ∼ q(X0 ) Z0 = ℰ(X0 ) ZT → ZT−1 → ⋯Z1 → Z0 X0 = 𝒟 (Z0 ) ℰ 𝒟 “High-Resolution Im a ge Synthesis with L a tent Diffusion Models”, Romb a ch et a l., CVPR 2022
Copyright (c) 2022 by Ayan Das (@ dasayan05) Classi f ier Guidance • Require labels (or some conditioning info)
• Train an external classi fi er — completely unrelated to the di ff usion model
• Modify the unconditional noise-estimator with the classi fi er to yield a conditional noise-estimator pϕ (Y|X) Guiding the reverse process with extern a l cl a ssi f ier ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt)
Copyright (c) 2022 by Ayan Das (@ dasayan05) CLIP Guidance • Guide reverse process with text condition
• Instead of classi fi er gradient, maximise dot product of CLIP embeddings C Introduced in the “GLIDE: …” p a per from OpenAI ̂ ϵθ*,ϕ*(Xt , t, Y) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt log pϕ*(Y|Xt) ̂ ϵθ*,ϕ*(Xt , t, C) = ϵθ*(Xt , t)−λ ⋅ σ[t]∇Xt (ℰI (Xt ) ⋅ ℰT (C))
Copyright (c) 2022 by Ayan Das (@ dasayan05) “Super-Resolution” with Conditional Diffusion “Im a ge Super-Resolution vi a Iter a tive Re f inement”, S a h a ri a et a l. Y ∼ pθ (X|Y) Xt−1 ∼ pθ (Xt−1 |Xt , Y) Y
Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (1) “Iter a tive L a tent V a ri a ble Re f inement (ILVR)”, Jooyoung Choi et a l. Increasingly de-correlated samples —> Y0 Y1 Yt−1 YT Yt Xt−1 = X′  t−1 − LPFN (X′  t−1 ) + LPFN (Yt−1 ) XT Xt
Copyright (c) 2022 by Ayan Das (@ dasayan05) “Tweaking” the sampling process (2) “SDEdit: Guided im a ge synthesis …”, Chenlin et a l. Forward di ff use the condition —> Y0 Y1 Yt Xt := Yt Xt−1 ∼ pθ (Xt−1 |Xt )
Copyright (c) 2022 by Ayan Das (@ dasayan05) Forward-Reverse process is quite generic Do not a ssume the structure of the d a t a a nd/or model Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ ̂ ϵ ← ϵθ(Xt , t)
Copyright (c) 2022 by Ayan Das (@ dasayan05) Diffusion on continuous sequences (1) My l a test work; a per under review> X0 := [x(0), x(1), ⋯, x(τ), ⋯] Xt = ¯ αt ⋅ X0 + 1 − ¯ αt ⋅ ϵ [ϵ(0) t , ϵ(1) t , ⋯, ϵ(τ) t , ⋯] ← BiRNN([X(0) t , X(1) t , ⋯, X(τ) t , ⋯], t)