Slide 1

Slide 1 text

Learning of Wasserstein generative models and patch based texture synthesis Antoine Houdard, Arthur Leclaire, Nicolas Papadakis, Julien Rabin Online lecture series "Mathematics of Deep Learning" June 8th 2021 N. Papadakis Wasserstein Generative Models for Texture Synthesis 1 / 1

Slide 2

Slide 2 text

Outline N. Papadakis Wasserstein Generative Models for Texture Synthesis 2 / 1

Slide 3

Slide 3 text

Outline N. Papadakis Wasserstein Generative Models for Texture Synthesis 3 / 1

Slide 4

Slide 4 text

Generative models Popular usages: Generating numbers, clothing, bedrooms, faces... N. Papadakis Wasserstein Generative Models for Texture Synthesis 4 / 1

Slide 5

Slide 5 text

Generative models • Data {y1, . . . , yN} sampled from Y ∼ ν • Synthetic distribution µθ = gθ ζ Goal: find the best θ s.t. µθ is close in some sense to ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 5 / 1

Slide 6

Slide 6 text

Generative models Variational Auto-encoder [Kingma et al. ’13] • Decoder as generative model gθ GAN [Goodfellow et al. ’14] • Discriminator dη between fake gθ (Z) and true Y samples min θ max η Eν [log(dη (Y))] + Eζ [log(1 − dη (gθ (Z)))] WGAN [Arjovsky et al. ’17] • Compare fake Z ∼ µθ = gθ ζ and true Y ∼ ν sample distributions min θ D(µθ , ν) • Duality of Wasserstein distance D = W1 yields min θ max ψ∈Lip1 Eν [ψ(Y)] − Eζ [ψ(gθ (Z))] • Parameterization of the dual variable ψ with dη Questions: Other Wasserstein costs? Training strategies? N. Papadakis Wasserstein Generative Models for Texture Synthesis 6 / 1

Slide 7

Slide 7 text

Patch-based Texture Synthesis → • Copy Paste [Efros and Leung, ’99] → • Iterative refinement with nearest neighbors [Kwatra, ’05] • Impose patch distribution at different scales [Gutierrez et al. ’17, Leclaire and Rabin ’19] Image composed of patches processed independently Apply the algorithm for each new synthesis N. Papadakis Wasserstein Generative Models for Texture Synthesis 7 / 1

Slide 8

Slide 8 text

Texture synthesis with Neural Networks • Gram matrix of VGG features at different scales [Gatys et al. ’15] min u ||Gu − Gv ||2 → Prescribe features of an example image v Process the whole image u and not its patches independently • Train a feedforward generative network gθ [Ulyanov et al, ’16] min θ Eζ||Ggθ(Z) − Gv ||2 Real time synthesis Questions: Wasserstein metric between feature distributions? Dealing with patches? N. Papadakis Wasserstein Generative Models for Texture Synthesis 8 / 1

Slide 9

Slide 9 text

Outline N. Papadakis Wasserstein Generative Models for Texture Synthesis 9 / 1

Slide 10

Slide 10 text

Optimal Transport (OT) • OT defines a family of distances between densities of probability • Transport a mass µ(x) onto ν(y) x y Euclidean • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1

Slide 11

Slide 11 text

Optimal Transport (OT) • OT defines a family of distances between densities of probability • Transport a mass µ(x) onto ν(y) x y Euclidean • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1

Slide 12

Slide 12 text

Optimal Transport (OT) • OT defines a family of distances between densities of probability • Transport a mass µ(x) onto ν(y) x y Wasserstein • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1

Slide 13

Slide 13 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 14

Slide 14 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Interpolation µt between images N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 15

Slide 15 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Interpolation µt between images µ ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 16

Slide 16 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Transfer of colors between images N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 17

Slide 17 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 18

Slide 18 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 19

Slide 19 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 20

Slide 20 text

Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete [Mérigot ’11, et al. ’17] What’s next • More on the (semi-discrete formulation) of the optimal transport cost OT(µ, ν) • Differentiability and regularization of the cost min θ OT(µθ , ν) • Application to patch-based texture synthesis N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1

Slide 21

Slide 21 text

Optimal Transport cost • Continuous cost function c : Rd × Rd → R • µ, ν probability measures supported on compacts X, Y ⊂ Rd , let OTc(µ, ν) = min π∈Π(µ,ν) c(x, y)dπ(x, y) Π(µ, ν) : set of probability measures on X × Y with marginals µ, ν. Theorem [Villani ’03, Santambrogio ’15] Strong duality holds i.e. OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν where max is taken on all functions ϕ ∈ L1(µ), ψ ∈ L1(ν) such that ϕ(x) + ψ(y) c(x, y) dµ(x) a.e., dν(y) a.e. N. Papadakis Wasserstein Generative Models for Texture Synthesis 12 / 1

Slide 22

Slide 22 text

c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x, y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1

Slide 23

Slide 23 text

c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x, y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • c-transforms inherit regularity from c N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1

Slide 24

Slide 24 text

c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x, y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • If c(x, y) = ||x − y||, then ψc = −ψ and ψ is 1-lipschitz [Kantorovich and Rubinstein, ’58] OTc(µ, ν) = max ψ∈Lip1 Y ψ(y)dν(y) − X ψ(x)dµ(x) N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1

Slide 25

Slide 25 text

c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x, y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • For a discrete ν = J j=1 νjδyj and ψj = ψ(yj) ψc(x) = min j∈{1,··· ,J} c(x, yj) − ψj OTc(µ, ν) = max {ψj }J j=1 X ψc(x)dµ(x) + J j=1 νjψj N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1

Slide 26

Slide 26 text

Semi-discrete semi-dual formulation • Semi-discrete cost OTc(µ, ν) = max {ψj }J j=1 X ψc(x)dµ(x) + J j=1 νjψj • Without equally distant indexes j, the argmin in ψc defines a map Tψ(x) = argmin {yj }J j=1 c(x, yj) − ψj → “Biased” nearest neighbor matching • Preimages of Tψ are called Laguerre cells: Lj(ψ) = {x | ∀k = j, c(x, yj) − ψj < c(x, yk ) − ψk }. T0 Tψ N. Papadakis Wasserstein Generative Models for Texture Synthesis 14 / 1

Slide 27

Slide 27 text

Some questions • Design some algorithms to solve inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν • Is the loss function OTc(µθ, ν) regular? • If not, what kind of problems happen? • Do these problems appear in discrete/semi-discrete cases? • Does this scale up in order to address image synthesis problems? N. Papadakis Wasserstein Generative Models for Texture Synthesis 15 / 1

Slide 28

Slide 28 text

Outline N. Papadakis Wasserstein Generative Models for Texture Synthesis 16 / 1

Slide 29

Slide 29 text

Related GAN works inf θ OTc(µθ , ν) = inf θ max ψ ψcdµθ + ψdν [Goodfellow et al. ’14] GAN (Jensen-Shannon divergence) [Arjovsky et al. ’17] Wasserstein GAN (Wasserstein distance with L1-cost) [Gulrajani et al. ’17] WGAN-GP: Wasserstein GAN with Gradient Penalty [Genevay et al. ’18] Generative models with Sinkhorn divergences [Salimans et al. ’18] Improving GANs using Optimal transport [Liu et al. ’18] WGAN-TS (for Two Steps) [Chen et al. ’19] Semi-discrete Wasserstein generative network training Differential properties of OT [Burger et al. ’12] Wasserstein distance and regularized densitiy [Cuturi and Peyré ’15] Gradient of regularized Wasserstein distance [Cazelles et al. ’19] Proof of differentiability in both previous settings [Degournay et al. ’19] Differentiation w.r.t. the discrete target measure N. Papadakis Wasserstein Generative Models for Texture Synthesis 17 / 1

Slide 30

Slide 30 text

What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1

Slide 31

Slide 31 text

What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1

Slide 32

Slide 32 text

What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1

Slide 33

Slide 33 text

What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1

Slide 34

Slide 34 text

WGAN problem Given a generator µθ = gθ ζ, solve inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Potential ψ acts as a discriminator between µθ and ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1

Slide 35

Slide 35 text

WGAN problem Given a generator µθ = gθ ζ, solve inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Theorem [Arjovsky et al., 2017] Let θ0 and ψ∗ 0 satisfying W(θ0) = F(ψ∗ 0 , θ0). If W and θ → F(ψ∗ 0 , θ) are both differentiable at θ0 , then ∇W(θ0) = ∇θ F(ψ∗ 0 , θ0) (Grad-OT) There are cases where no such couple (ψ∗ 0 , θ0) exists N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1

Slide 36

Slide 36 text

WGAN problem Given a generator µθ = gθ ζ, solve inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Theorem [Arjovsky et al., 2017] Let θ0 and ψ∗ 0 satisfying W(θ0) = F(ψ∗ 0 , θ0). If W and θ → F(ψ∗ 0 , θ) are both differentiable at θ0 , then ∇W(θ0) = ∇θ F(ψ∗ 0 , θ0) (Grad-OT) There are cases where no such couple (ψ∗ 0 , θ0) exists N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1

Slide 37

Slide 37 text

A telling counter-example Proposition Let µθ = δθ with θ ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • W(θ) = 1 2 c(θ, y1) + c(θ, y2) = θ − y1 p p + θ − y2 p p N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1

Slide 38

Slide 38 text

A telling counter-example Proposition Let µθ = δθ with θ ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • W(θ) = 1 2 c(θ, y1) + c(θ, y2) = θ − y1 p p + θ − y2 p p N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1

Slide 39

Slide 39 text

A telling counter-example Proposition Let µθ = δθ with θ ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • F(ψ, θ) = ψc(θ) + 2 j=1 1 2 ψj = mini=1,2 [c(θ, yj ) − ψi ] + ψ1+ψ2 2 Fix θ0 and ψ∗ 0 , then (ψ∗ 0 )1 − (ψ∗ 0 )2 = c(θ0, y1) − c(θ0, y2), and F(ψ∗ 0 , θ) = c(θ, y1) + 1 2 c(θ0, y2) − c(θ0, y1) if θ ∈ L1(ψ∗ 0 ) c(θ, y2) + 1 2 c(θ0, y1) − c(θ0, y2) if θ ∈ L2(ψ∗ 0 ) F(ψ∗ 0 , ·) not differentiable at the boundary between L1(ψ∗ 0 ) and L2(ψ∗ 0 ) N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1

Slide 40

Slide 40 text

Consequence: Instabilities in training • Iterative algorithms for solving inf θ W(θ) need an estimation of the gradient. • For the L2-cost ∇W(θ) = θ − y1 + θ − y2 is estimated by ∇θ F(ψ, θ) = θ − y1 if θ ∈ L1(ψ) θ − y2 if θ ∈ L2(ψ) Solution 1 Regularization of optimal transport 2 Assumption on the generator N. Papadakis Wasserstein Generative Models for Texture Synthesis 21 / 1

Slide 41

Slide 41 text

Consequence: Instabilities in training • Iterative algorithms for solving inf θ W(θ) need an estimation of the gradient. • For the L2-cost ∇W(θ) = θ − y1 + θ − y2 is estimated by ∇θ F(ψ, θ) = θ − y1 if θ ∈ L1(ψ) θ − y2 if θ ∈ L2(ψ) Solution 1 Regularization of optimal transport 2 Assumption on the generator N. Papadakis Wasserstein Generative Models for Texture Synthesis 21 / 1

Slide 42

Slide 42 text

Regularized Optimal Transport Definition [Genevay et al. ’19] For λ > 0, the regularized OT cost is defined by OTλ c (µ, ν) = inf π∈Π(µ,ν) c(x, y)dπ(x, y) + λKL(π|µ ⊗ ν) where KL is the Kullback-Leibler divergence: KL(π|µ ⊗ ν) = log dπ(x,y) dµ(x)dν(y) dπ(x, y) if dπ dµdν exists +∞ otherwise . N. Papadakis Wasserstein Generative Models for Texture Synthesis 22 / 1

Slide 43

Slide 43 text

Semi-dual Regularized Problem Proposition Assume that c ∈ L∞(X × Y), then OTλ c (µ, ν) = max ψ∈L∞(Y) X ψc,λ(x)dµ(x) + Y ψ(y)dν(y) where ψc,λ(x) = Softmin j∈{1,··· ,J} c(x, yj) − ψj = −λ log Y exp ψ(y) − c(x, y) λ dν(y) Theorem [Genevay ’19, Chizat et al. ’19] For c ∈ L∞(X × Y), the semi-dual problem admits a solution ψ∗ ∈ L∞(ν) which is unique ν − a.e. up to an additive constant NB: Solutions are characterized by the fixed point equation (ψc,λ)c,λ = ψ N. Papadakis Wasserstein Generative Models for Texture Synthesis 23 / 1

Slide 44

Slide 44 text

Regularity of the full problem • µθ : distribution of g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1

Slide 45

Slide 45 text

Regularity of the full problem • µθ : distribution of g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) Hypothesis (H) There exists L : Θ × Z → R+ such that, for any θ ∈ Θ, there is a neighborhood Vθ of θ such that ∀θ ∈ Vθ Z − a.s., g(θ, Z) − g(θ , Z) L(θ, Z) θ − θ with E[L(θ, Z)] < ∞. Proposition Let λ > 0. Assume that c is C 1, and g satisfies (H). For any θ0 ∈ Θ and any ψ ∈ L∞(Y), θ → Fλ(ψ, θ) is differentiable at θ0 ∇θ Fλ(ψ, θ0) = E (∂θ g(θ0, Z))T ∇ψc,λ(g(θ0, Z)) If g is C 1, then so is Fλ(ψ, ·) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1

Slide 46

Slide 46 text

Regularity of the full problem • µθ : distribution of g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) Hypothesis (H) There exists L : Θ × Z → R+ such that, for any θ ∈ Θ, there is a neighborhood Vθ of θ such that ∀θ ∈ Vθ Z − a.s., g(θ, Z) − g(θ , Z) L(θ, Z) θ − θ with E[L(θ, Z)] < ∞. Proposition Let λ > 0. Assume that c is C 1, and g satisfies (H). For any θ0 ∈ Θ and any ψ ∈ L∞(Y), θ → Fλ(ψ, θ) is differentiable at θ0 ∇θ Fλ(ψ, θ0) = E (∂θ g(θ0, Z))T ∇ψc,λ(g(θ0, Z)) If g is C 1, then so is Fλ(ψ, ·) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1

Slide 47

Slide 47 text

Gradient of the regularized Wasserstein cost Theorem Let λ > 0. Assume that c is C 1, g is C 1 and satisfies (H). Then Wλ : θ → OTλ c (µθ, ν) is C 1, and for any θ ∈ Θ, ∇θ Wλ(θ) = ∇θ Fλ(ψ∗, θ) = E (∂θ g(θ, Z))T ∇ψ∗,c,λ(g(θ, Z)) where ψ∗ satisfies Wλ(θ) = Fλ(ψ∗, θ). N. Papadakis Wasserstein Generative Models for Texture Synthesis 25 / 1

Slide 48

Slide 48 text

If the generator g is not C 1? Lemma Assume that c is C 1. Then for any λ 0, and any θ, θ ∈ Ω, |Wλ(θ) − Wλ(θ )| c ∞ E[ g(θ, Z) − g(θ , Z) ]. Theorem Let λ > 0. Assume that c is C 1 and g satisfies (H). Then Wλ is locally Lipschitz and thus differentiable a.e.. For almost any θ, ∇θ Wλ(θ) = ∇θ Fλ(ψ∗, θ) with ψ∗ such that Wλ(θ) = Fλ(ψ∗, θ) NB: One cannot expect more regularity in Wλ than there is in the ground cost c or the generator g N. Papadakis Wasserstein Generative Models for Texture Synthesis 26 / 1

Slide 49

Slide 49 text

Back to the counter-example OTc OTλ c N. Papadakis Wasserstein Generative Models for Texture Synthesis 27 / 1

Slide 50

Slide 50 text

In the semi-discrete case N. Papadakis Wasserstein Generative Models for Texture Synthesis 28 / 1

Slide 51

Slide 51 text

In the semi-discrete case N. Papadakis Wasserstein Generative Models for Texture Synthesis 28 / 1

Slide 52

Slide 52 text

In the unregularized semi-discrete case Theorem For Y finite, ν = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1

Slide 53

Slide 53 text

In the unregularized semi-discrete case Theorem For Y finite, ν = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1

Slide 54

Slide 54 text

In the unregularized semi-discrete case Theorem For Y finite, ν = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1

Slide 55

Slide 55 text

Limitation N. Papadakis Wasserstein Generative Models for Texture Synthesis 30 / 1

Slide 56

Slide 56 text

Limitation N. Papadakis Wasserstein Generative Models for Texture Synthesis 30 / 1

Slide 57

Slide 57 text

Limitation N. Papadakis Wasserstein Generative Models for Texture Synthesis 30 / 1

Slide 58

Slide 58 text

Limitation N. Papadakis Wasserstein Generative Models for Texture Synthesis 30 / 1

Slide 59

Slide 59 text

In practice • Discrete target measure ν = J j=1 νjδyj J: size of the dataset • WGAN problem min θ OTλ c (gθ ζ, ν) = min θ max ψ∈RJ EZ∼ζ ψc,λ(gθ(Z)) + J j=1 νjψj with ψc,λ(x) = −λ log J j=1 exp ψj −c(x,yj ) λ νj . Alternate optimization - The problem is concave in ψ: averaged stochastic gradient ascent to evaluate {ψj}J j=1 - ADAM step on θ N. Papadakis Wasserstein Generative Models for Texture Synthesis 31 / 1

Slide 60

Slide 60 text

Generation of MNIST digits λ = 0.001 λ = 0.01 λ = 0.1 N. Papadakis Wasserstein Generative Models for Texture Synthesis 32 / 1

Slide 61

Slide 61 text

Outline N. Papadakis Wasserstein Generative Models for Texture Synthesis 33 / 1

Slide 62

Slide 62 text

Patch-based Texture Synthesis Patches seen as vector of Rs×s N. Papadakis Wasserstein Generative Models for Texture Synthesis 34 / 1

Slide 63

Slide 63 text

Patch-based texture synthesis • Patch distribution of an image u µu = 1 n n i=1 δPi u where Pi is the linear operator extracting the i-th patch • Given a target image v search an image u that solves min u OTc(µu, µv ) No generator here, we just optimize pixel values of u: discrete OT N. Papadakis Wasserstein Generative Models for Texture Synthesis 35 / 1

Slide 64

Slide 64 text

Patch-based texture synthesis • Patch distribution of an image u µu = 1 n n i=1 δPi u where Pi is the linear operator extracting the i-th patch • Given a target image v search an image u that solves min u OTc(µu, µv ) No generator here, we just optimize pixel values of u: discrete OT N. Papadakis Wasserstein Generative Models for Texture Synthesis 35 / 1

Slide 65

Slide 65 text

Minimizing OT cost w.r.t. image u • Alternate optimization scheme on min u OTc (µu, µv ) = min u max ψ∈Rm F(ψ, u) where F(ψ, u) = 1 n n i=1 ψc(Pi u) + 1 m m j=1 ψj • At fixed u, maxψ F(ψ, u) is a concave maximization problem with bounded subgradients −→ allows for (stochastic) subgradient ascent. −→ convergence guarantee on ψ in O(log t √ t ) Alternate Optimization Initialize u0. For k = 0, . . . , K − 1 ψk ≈ argmaxψ F(ψ, uk ) (subgradient ascent) uk+1 = uk − η∇u F(ψk , uk ) (gradient descent) N. Papadakis Wasserstein Generative Models for Texture Synthesis 36 / 1

Slide 66

Slide 66 text

Relation with iterated nearest neighbor projections Proposition Let ψ ∈ Rm. Assume that for all i = 1, . . . , n, we can uniquely define σ(i) = argmin1 j m c(Pi u, Pj v) − ψj. Then F(ψ, ·) is differentiable at u, and ∇uF(ψ, u) = 1 n n i=1 PT i ∂x c(Pi u, Pσ(i) v) • If c(x, y) = 1 2 x − y 2 2 and η = α n s2 , the image update is uk+1 = (1 − α)uk + αvk vk = 1 s2 n i=1 PT i Pσk (i) v σk (i) = argminj 1 2 Pi uk − Pj v 2 − ψk j • [Kwatra et al. ’05] : ψ = 0 N. Papadakis Wasserstein Generative Models for Texture Synthesis 37 / 1

Slide 67

Slide 67 text

Illustration N. Papadakis Wasserstein Generative Models for Texture Synthesis 38 / 1

Slide 68

Slide 68 text

Multi-resolution Algorithm For = 1, . . . , L, S u is a down-sampling of u on a grid 2 −1 Z2 min u L =1 OTc(µS u , µS v ) = min u L =1 max ψ F(ψ , S u) Algorithm 1: Multi-resolution Image Optimization Initialize u0 For k = 0, . . . , K − 1 For = 1, . . . , L • ψk ≈ argmaxψ F(ψ, S uk ) (subgradient ascent) • One step of ADAM algorithm on minu L =1 F(ψ , S u) N. Papadakis Wasserstein Generative Models for Texture Synthesis 39 / 1

Slide 69

Slide 69 text

Results of Image Optimization Exemplar Initialization Synthesis Loss N. Papadakis Wasserstein Generative Models for Texture Synthesis 40 / 1

Slide 70

Slide 70 text

Synthesis with Image Optimization (128 × 128) → (256 × 256) N. Papadakis Wasserstein Generative Models for Texture Synthesis 41 / 1

Slide 71

Slide 71 text

Synthesis with Image Optimization (128 × 128) → (256 × 256) N. Papadakis Wasserstein Generative Models for Texture Synthesis 42 / 1

Slide 72

Slide 72 text

Synthesis with Image Optimization (256 × 256) → (256 × 512) N. Papadakis Wasserstein Generative Models for Texture Synthesis 43 / 1

Slide 73

Slide 73 text

Inpainting N. Papadakis Wasserstein Generative Models for Texture Synthesis 44 / 1

Slide 74

Slide 74 text

Visual comparisons Original Our [Kwatra ’05] [Gatys ’15] Original Our [Kwatra ’05] [Gatys ’15] N. Papadakis Wasserstein Generative Models for Texture Synthesis 45 / 1

Slide 75

Slide 75 text

Link with [Gatys et al. ’15] Texture synthesis from [Gatys et al. ’15]: min u ||Gl(u) − Gl(v)||2, where Gl are Gram matrices of VGG features at scale l Idea study the following cases Patch distributions and Gram loss → does not work Patch distribution and OT loss → our algorithm VGG feature distribution and Gram loss → [Gatys et al. ’15] VGG feature distribution and OT loss → extension of our method N. Papadakis Wasserstein Generative Models for Texture Synthesis 46 / 1

Slide 76

Slide 76 text

Texture barycenter N. Papadakis Wasserstein Generative Models for Texture Synthesis 47 / 1

Slide 77

Slide 77 text

Style transfer N. Papadakis Wasserstein Generative Models for Texture Synthesis 48 / 1

Slide 78

Slide 78 text

Style transfer N. Papadakis Wasserstein Generative Models for Texture Synthesis 48 / 1

Slide 79

Slide 79 text

Style transfer N. Papadakis Wasserstein Generative Models for Texture Synthesis 48 / 1

Slide 80

Slide 80 text

Style transfer N. Papadakis Wasserstein Generative Models for Texture Synthesis 48 / 1

Slide 81

Slide 81 text

Style transfer N. Papadakis Wasserstein Generative Models for Texture Synthesis 48 / 1

Slide 82

Slide 82 text

Generative Model • Image optimisation: µu = 1 n n i=1 δPi u discrete → Discrete OT Optimization for each new image N. Papadakis Wasserstein Generative Models for Texture Synthesis 49 / 1

Slide 83

Slide 83 text

Generative Model • Image optimisation: µu = 1 n n i=1 δPi u discrete → Discrete OT Optimization for each new image • Generative model: µθ = 1 n n i=1 (Pi ◦ gθ) ζ continuous → Semi-discrete OT Learn a generator gθ once for all N. Papadakis Wasserstein Generative Models for Texture Synthesis 49 / 1

Slide 84

Slide 84 text

Generative Networks Texture Networks Generate images of arbitrary size [Ulyanov et al., 2016] N. Papadakis Wasserstein Generative Models for Texture Synthesis 50 / 1

Slide 85

Slide 85 text

Learn a generative network from an exemplar texture Replace • u by the output gθ(Z) of a convolutional neural network. • µu by the patch distribution µθ of gθ(Z). New loss function min θ L =1 max ψ E[F(ψ , S gθ(Z))]. Algorithm 2: Multi-resolution Generative Network Optimization Initialize θ For k = 0, . . . , K − 1, For = 1, . . . , L, • ψk ≈ argmaxψ E[F(ψ, S gθ (Z))] (ASGA) • Sample z ∼ ζ and take one step of ADAM algorithm on min θ L =1 F(ψ , S gθ (z)) N. Papadakis Wasserstein Generative Models for Texture Synthesis 51 / 1

Slide 86

Slide 86 text

Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1

Slide 87

Slide 87 text

Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1

Slide 88

Slide 88 text

Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1

Slide 89

Slide 89 text

Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1

Slide 90

Slide 90 text

Quantitative Results SIFID VGG Gram norm Muti-scale patch OT Avg Avg Avg Algo 1 0.43 0.02 0.08 0.71 0.31 122 6 141 865 283 0.45 0.15 0.09 0.69 0.35 Algo 2 1.13 0.06 0.18 1.82 0.80 233 19 151 922 331 0.48 0.16 0.10 0.78 0.38 TEXNET 0.11 0.08 0.18 0.17 0.14 218 9 54 190 118 0.65 0.24 0.17 1.22 0.57 SINGAN 0.93 0.10 0.17 0.37 0.39 299 8 207 394 227 0.54 0.24 0.26 0.79 0.46 PSGAN 0.27 0.91 1.14 0.49 0.70 224 512 753 1366 714 0.68 0.43 0.34 1.19 0.66 TEXTO 1.22 0.07 0.18 1.67 0.79 260 24 152 1030 367 0.49 0.16 0.11 0.75 0.38 Comparisons based on • SIFID: Single Image Fréchet Inception Distance [Shaham et al. ’19] compares responses to a trained “inception” network • VGG Gram loss [Gatys et al. ’15] compares cross-correlations of responses to the neural network VGG • Our multi-scale patch OT loss. N. Papadakis Wasserstein Generative Models for Texture Synthesis 53 / 1

Slide 91

Slide 91 text

Conclusion • Highlight of differentiability problems in WGAN problems • Ensure existence of gradients in the semi-discrete case • Leads to an alternate optimization framework that can be used for some image synthesis tasks that cannot scale (yet) to very large target measures PERSPECTIVES: • Look for regularity results for unregularized framework • Impact of entropic regularization for image synthesis problems • Exploit parameterizations of the dual variable ψ THANK YOU FOR YOUR ATTENTION N. Papadakis Wasserstein Generative Models for Texture Synthesis 54 / 1