Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning of Wasserstein generative models and patch based texture synthesis

npapadakis
November 14, 2021

Learning of Wasserstein generative models and patch based texture synthesis

npapadakis

November 14, 2021
Tweet

More Decks by npapadakis

Other Decks in Research

Transcript

  1. Learning of Wasserstein generative models and patch based texture synthesis

    Antoine Houdard, Arthur Leclaire, Nicolas Papadakis, Julien Rabin Online lecture series "Mathematics of Deep Learning" June 8th 2021 N. Papadakis Wasserstein Generative Models for Texture Synthesis 1 / 1
  2. Generative models Popular usages: Generating numbers, clothing, bedrooms, faces... N.

    Papadakis Wasserstein Generative Models for Texture Synthesis 4 / 1
  3. Generative models • Data {y1, . . . , yN}

    sampled from Y ∼ ν • Synthetic distribution µθ = gθ ζ Goal: find the best θ s.t. µθ is close in some sense to ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 5 / 1
  4. Generative models Variational Auto-encoder [Kingma et al. ’13] • Decoder

    as generative model gθ GAN [Goodfellow et al. ’14] • Discriminator dη between fake gθ (Z) and true Y samples min θ max η Eν [log(dη (Y))] + Eζ [log(1 − dη (gθ (Z)))] WGAN [Arjovsky et al. ’17] • Compare fake Z ∼ µθ = gθ ζ and true Y ∼ ν sample distributions min θ D(µθ , ν) • Duality of Wasserstein distance D = W1 yields min θ max ψ∈Lip1 Eν [ψ(Y)] − Eζ [ψ(gθ (Z))] • Parameterization of the dual variable ψ with dη Questions: Other Wasserstein costs? Training strategies? N. Papadakis Wasserstein Generative Models for Texture Synthesis 6 / 1
  5. Patch-based Texture Synthesis → • Copy Paste [Efros and Leung,

    ’99] → • Iterative refinement with nearest neighbors [Kwatra, ’05] • Impose patch distribution at different scales [Gutierrez et al. ’17, Leclaire and Rabin ’19]  Image composed of patches processed independently  Apply the algorithm for each new synthesis N. Papadakis Wasserstein Generative Models for Texture Synthesis 7 / 1
  6. Texture synthesis with Neural Networks • Gram matrix of VGG

    features at different scales [Gatys et al. ’15] min u ||Gu − Gv ||2 → Prescribe features of an example image v  Process the whole image u and not its patches independently • Train a feedforward generative network gθ [Ulyanov et al, ’16] min θ Eζ||Ggθ(Z) − Gv ||2  Real time synthesis Questions: Wasserstein metric between feature distributions? Dealing with patches? N. Papadakis Wasserstein Generative Models for Texture Synthesis 8 / 1
  7. Optimal Transport (OT) • OT defines a family of distances

    between densities of probability • Transport a mass µ(x) onto ν(y) x y Euclidean • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1
  8. Optimal Transport (OT) • OT defines a family of distances

    between densities of probability • Transport a mass µ(x) onto ν(y) x y Euclidean • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1
  9. Optimal Transport (OT) • OT defines a family of distances

    between densities of probability • Transport a mass µ(x) onto ν(y) x y Wasserstein • Define a cost c(x, y) of mass transport between locations x and y • OT: application with mimimal global cost that transfers µ onto ν • If c(x, y) = ||x − y||p, Lp Wasserstein distance • Interpolation with transport map T N. Papadakis Wasserstein Generative Models for Texture Synthesis 10 / 1
  10. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  11. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Interpolation µt between images N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  12. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Interpolation µt between images µ ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  13. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Transfer of colors between images N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  14. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  15. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  16. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] Wasserstein GAN N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  17. Formulations Continuous [Benamou - Brenier ’00] Discrete [Cuturi ’13] Semi-discrete

    [Mérigot ’11, et al. ’17] What’s next • More on the (semi-discrete formulation) of the optimal transport cost OT(µ, ν) • Differentiability and regularization of the cost min θ OT(µθ , ν) • Application to patch-based texture synthesis N. Papadakis Wasserstein Generative Models for Texture Synthesis 11 / 1
  18. Optimal Transport cost • Continuous cost function c : Rd

    × Rd → R • µ, ν probability measures supported on compacts X, Y ⊂ Rd , let OTc(µ, ν) = min π∈Π(µ,ν) c(x, y)dπ(x, y) Π(µ, ν) : set of probability measures on X × Y with marginals µ, ν. Theorem [Villani ’03, Santambrogio ’15] Strong duality holds i.e. OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν where max is taken on all functions ϕ ∈ L1(µ), ψ ∈ L1(ν) such that ϕ(x) + ψ(y) c(x, y) dµ(x) a.e., dν(y) a.e. N. Papadakis Wasserstein Generative Models for Texture Synthesis 12 / 1
  19. c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x,

    y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1
  20. c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x,

    y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • c-transforms inherit regularity from c N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1
  21. c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x,

    y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • If c(x, y) = ||x − y||, then ψc = −ψ and ψ is 1-lipschitz [Kantorovich and Rubinstein, ’58] OTc(µ, ν) = max ψ∈Lip1 Y ψ(y)dν(y) − X ψ(x)dµ(x) N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1
  22. c-transforms and semi-dual formulation c-transforms ϕc(y) = min x∈X [c(x,

    y) − ϕ(x)] ψc(x) = min y∈Y [c(x, y) − ψ(y)] Semi-dual OTc(µ, ν) = max ϕ,ψ X ϕdµ + Y ψdν = max ϕ∈C (X) X ϕ(x)dµ(x) + Y ϕc(y)dν(y) = max ψ∈C (Y) X ψc(x)dµ(x) + Y ψ(y)dν(y) • For a discrete ν = J j=1 νjδyj and ψj = ψ(yj) ψc(x) = min j∈{1,··· ,J} c(x, yj) − ψj OTc(µ, ν) = max {ψj }J j=1 X ψc(x)dµ(x) + J j=1 νjψj N. Papadakis Wasserstein Generative Models for Texture Synthesis 13 / 1
  23. Semi-discrete semi-dual formulation • Semi-discrete cost OTc(µ, ν) = max

    {ψj }J j=1 X ψc(x)dµ(x) + J j=1 νjψj • Without equally distant indexes j, the argmin in ψc defines a map Tψ(x) = argmin {yj }J j=1 c(x, yj) − ψj → “Biased” nearest neighbor matching • Preimages of Tψ are called Laguerre cells: Lj(ψ) = {x | ∀k = j, c(x, yj) − ψj < c(x, yk ) − ψk }. T0 Tψ N. Papadakis Wasserstein Generative Models for Texture Synthesis 14 / 1
  24. Some questions • Design some algorithms to solve inf θ

    OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν • Is the loss function OTc(µθ, ν) regular? • If not, what kind of problems happen? • Do these problems appear in discrete/semi-discrete cases? • Does this scale up in order to address image synthesis problems? N. Papadakis Wasserstein Generative Models for Texture Synthesis 15 / 1
  25. Related GAN works inf θ OTc(µθ , ν) = inf

    θ max ψ ψcdµθ + ψdν [Goodfellow et al. ’14] GAN (Jensen-Shannon divergence) [Arjovsky et al. ’17] Wasserstein GAN (Wasserstein distance with L1-cost) [Gulrajani et al. ’17] WGAN-GP: Wasserstein GAN with Gradient Penalty [Genevay et al. ’18] Generative models with Sinkhorn divergences [Salimans et al. ’18] Improving GANs using Optimal transport [Liu et al. ’18] WGAN-TS (for Two Steps) [Chen et al. ’19] Semi-discrete Wasserstein generative network training Differential properties of OT [Burger et al. ’12] Wasserstein distance and regularized densitiy [Cuturi and Peyré ’15] Gradient of regularized Wasserstein distance [Cazelles et al. ’19] Proof of differentiability in both previous settings [Degournay et al. ’19] Differentiation w.r.t. the discrete target measure N. Papadakis Wasserstein Generative Models for Texture Synthesis 17 / 1
  26. What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume

    c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1
  27. What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume

    c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1
  28. What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume

    c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1
  29. What convex optimization tells us Proposition [Santambrogio’s textbook ’15] Assume

    c continuous, X, Y ⊂ Rd and fix ν • µ → OTc(µ, ν) is convex • For all subgradient ϕ ∈ ∂µ OTc(µ, ν) OTc(µ, ν) = ϕdµ + ϕcdν Hence OTc(µ + χ, ν) OTc(µ, ν) + ϕdχ • If ϕ is unique up to additive constants, then one can show Gateaux-differentiability at (µ, ν) NB: Extension to entropy-regularized optimal transport [Feydy et al. ’18] Sufficient condition [Santambrogio ’15] c is C 1 and Supp(µ) (or ν) is the closure of a bounded connected open set Does not include c(x, y) = ||x − y|| N. Papadakis Wasserstein Generative Models for Texture Synthesis 18 / 1
  30. WGAN problem Given a generator µθ = gθ ζ, solve

    inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Potential ψ acts as a discriminator between µθ and ν N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1
  31. WGAN problem Given a generator µθ = gθ ζ, solve

    inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Theorem [Arjovsky et al., 2017] Let θ0 and ψ∗ 0 satisfying W(θ0) = F(ψ∗ 0 , θ0). If W and θ → F(ψ∗ 0 , θ) are both differentiable at θ0 , then ∇W(θ0) = ∇θ F(ψ∗ 0 , θ0) (Grad-OT) There are cases where no such couple (ψ∗ 0 , θ0) exists N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1
  32. WGAN problem Given a generator µθ = gθ ζ, solve

    inf θ OTc(µθ, ν) = inf θ max ψ ψcdµθ + ψdν For F(ψ, θ) = X ψc(x)dµθ(x) + Y ψ(y)dν(y) we have W(θ) := OTc(µθ, ν) = max ψ F(ψ, θ) Theorem [Arjovsky et al., 2017] Let θ0 and ψ∗ 0 satisfying W(θ0) = F(ψ∗ 0 , θ0). If W and θ → F(ψ∗ 0 , θ) are both differentiable at θ0 , then ∇W(θ0) = ∇θ F(ψ∗ 0 , θ0) (Grad-OT) There are cases where no such couple (ψ∗ 0 , θ0) exists N. Papadakis Wasserstein Generative Models for Texture Synthesis 19 / 1
  33. A telling counter-example Proposition Let µθ = δθ with θ

    ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • W(θ) = 1 2 c(θ, y1) + c(θ, y2) = θ − y1 p p + θ − y2 p p N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1
  34. A telling counter-example Proposition Let µθ = δθ with θ

    ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • W(θ) = 1 2 c(θ, y1) + c(θ, y2) = θ − y1 p p + θ − y2 p p N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1
  35. A telling counter-example Proposition Let µθ = δθ with θ

    ∈ Rd , and let ν = 1 2 δy1 + 1 2 δy2 with y1, y2 ∈ Rd distinct. Let c(x, y) = x − y p p , p > 1. Then • θ → W(θ) is differentiable everywhere. • For θ0 = y1+y2 2 and any ψ∗ 0 ∈ argmaxψ F(ψ, θ0), θ → F(ψ∗ 0 , θ) is not differentiable at θ0 . Hence (Grad-OT) relation does not hold (except for θ0 = y1+y2 2 ). Proof • F(ψ, θ) = ψc(θ) + 2 j=1 1 2 ψj = mini=1,2 [c(θ, yj ) − ψi ] + ψ1+ψ2 2 Fix θ0 and ψ∗ 0 , then (ψ∗ 0 )1 − (ψ∗ 0 )2 = c(θ0, y1) − c(θ0, y2), and F(ψ∗ 0 , θ) = c(θ, y1) + 1 2 c(θ0, y2) − c(θ0, y1) if θ ∈ L1(ψ∗ 0 ) c(θ, y2) + 1 2 c(θ0, y1) − c(θ0, y2) if θ ∈ L2(ψ∗ 0 ) F(ψ∗ 0 , ·) not differentiable at the boundary between L1(ψ∗ 0 ) and L2(ψ∗ 0 ) N. Papadakis Wasserstein Generative Models for Texture Synthesis 20 / 1
  36. Consequence: Instabilities in training • Iterative algorithms for solving inf

    θ W(θ) need an estimation of the gradient. • For the L2-cost ∇W(θ) = θ − y1 + θ − y2 is estimated by ∇θ F(ψ, θ) = θ − y1 if θ ∈ L1(ψ) θ − y2 if θ ∈ L2(ψ) Solution 1 Regularization of optimal transport 2 Assumption on the generator N. Papadakis Wasserstein Generative Models for Texture Synthesis 21 / 1
  37. Consequence: Instabilities in training • Iterative algorithms for solving inf

    θ W(θ) need an estimation of the gradient. • For the L2-cost ∇W(θ) = θ − y1 + θ − y2 is estimated by ∇θ F(ψ, θ) = θ − y1 if θ ∈ L1(ψ) θ − y2 if θ ∈ L2(ψ) Solution 1 Regularization of optimal transport 2 Assumption on the generator N. Papadakis Wasserstein Generative Models for Texture Synthesis 21 / 1
  38. Regularized Optimal Transport Definition [Genevay et al. ’19] For λ

    > 0, the regularized OT cost is defined by OTλ c (µ, ν) = inf π∈Π(µ,ν) c(x, y)dπ(x, y) + λKL(π|µ ⊗ ν) where KL is the Kullback-Leibler divergence: KL(π|µ ⊗ ν) = log dπ(x,y) dµ(x)dν(y) dπ(x, y) if dπ dµdν exists +∞ otherwise . N. Papadakis Wasserstein Generative Models for Texture Synthesis 22 / 1
  39. Semi-dual Regularized Problem Proposition Assume that c ∈ L∞(X ×

    Y), then OTλ c (µ, ν) = max ψ∈L∞(Y) X ψc,λ(x)dµ(x) + Y ψ(y)dν(y) where ψc,λ(x) = Softmin j∈{1,··· ,J} c(x, yj) − ψj = −λ log Y exp ψ(y) − c(x, y) λ dν(y) Theorem [Genevay ’19, Chizat et al. ’19] For c ∈ L∞(X × Y), the semi-dual problem admits a solution ψ∗ ∈ L∞(ν) which is unique ν − a.e. up to an additive constant NB: Solutions are characterized by the fixed point equation (ψc,λ)c,λ = ψ N. Papadakis Wasserstein Generative Models for Texture Synthesis 23 / 1
  40. Regularity of the full problem • µθ : distribution of

    g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1
  41. Regularity of the full problem • µθ : distribution of

    g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) Hypothesis (H) There exists L : Θ × Z → R+ such that, for any θ ∈ Θ, there is a neighborhood Vθ of θ such that ∀θ ∈ Vθ Z − a.s., g(θ, Z) − g(θ , Z) L(θ, Z) θ − θ with E[L(θ, Z)] < ∞. Proposition Let λ > 0. Assume that c is C 1, and g satisfies (H). For any θ0 ∈ Θ and any ψ ∈ L∞(Y), θ → Fλ(ψ, θ) is differentiable at θ0 ∇θ Fλ(ψ, θ0) = E (∂θ g(θ0, Z))T ∇ψc,λ(g(θ0, Z)) If g is C 1, then so is Fλ(ψ, ·) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1
  42. Regularity of the full problem • µθ : distribution of

    g(θ, Z) • Z: r.v. in Z ⊂ Rp with distribution ζ min θ OTλ c (µθ, ν) = min θ max ψ∈L∞(Y) E[ψc,λ(g(θ, Z))] + Y ψdν :=Fλ(ψ,θ) Hypothesis (H) There exists L : Θ × Z → R+ such that, for any θ ∈ Θ, there is a neighborhood Vθ of θ such that ∀θ ∈ Vθ Z − a.s., g(θ, Z) − g(θ , Z) L(θ, Z) θ − θ with E[L(θ, Z)] < ∞. Proposition Let λ > 0. Assume that c is C 1, and g satisfies (H). For any θ0 ∈ Θ and any ψ ∈ L∞(Y), θ → Fλ(ψ, θ) is differentiable at θ0 ∇θ Fλ(ψ, θ0) = E (∂θ g(θ0, Z))T ∇ψc,λ(g(θ0, Z)) If g is C 1, then so is Fλ(ψ, ·) N. Papadakis Wasserstein Generative Models for Texture Synthesis 24 / 1
  43. Gradient of the regularized Wasserstein cost Theorem Let λ >

    0. Assume that c is C 1, g is C 1 and satisfies (H). Then Wλ : θ → OTλ c (µθ, ν) is C 1, and for any θ ∈ Θ, ∇θ Wλ(θ) = ∇θ Fλ(ψ∗, θ) = E (∂θ g(θ, Z))T ∇ψ∗,c,λ(g(θ, Z)) where ψ∗ satisfies Wλ(θ) = Fλ(ψ∗, θ). N. Papadakis Wasserstein Generative Models for Texture Synthesis 25 / 1
  44. If the generator g is not C 1? Lemma Assume

    that c is C 1. Then for any λ 0, and any θ, θ ∈ Ω, |Wλ(θ) − Wλ(θ )| c ∞ E[ g(θ, Z) − g(θ , Z) ]. Theorem Let λ > 0. Assume that c is C 1 and g satisfies (H). Then Wλ is locally Lipschitz and thus differentiable a.e.. For almost any θ, ∇θ Wλ(θ) = ∇θ Fλ(ψ∗, θ) with ψ∗ such that Wλ(θ) = Fλ(ψ∗, θ) NB: One cannot expect more regularity in Wλ than there is in the ground cost c or the generator g N. Papadakis Wasserstein Generative Models for Texture Synthesis 26 / 1
  45. Back to the counter-example OTc OTλ c N. Papadakis Wasserstein

    Generative Models for Texture Synthesis 27 / 1
  46. In the unregularized semi-discrete case Theorem For Y finite, ν

    = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1
  47. In the unregularized semi-discrete case Theorem For Y finite, ν

    = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1
  48. In the unregularized semi-discrete case Theorem For Y finite, ν

    = J j=1 νjδyj assume c is C 1. Let θ ∈ Θ such that ∂θ g(θ, Z) exists almost surely and such that g satisfies (H) at θ. Let also ψ ∈ RJ such that, almost surely, g(θ, Z) ∈ J j=1 Lj(ψ). Then ∇θ F(ψ, θ) = E (∂θ g(θ, Z))T ∇ψc(g(θ, Z)) . • Fourth assumption: µθ X \ y∈Y Lψ (y) = 0 • If µθ (Y) = 0, deal with lipschitz costs (c(x) = ||x −y||) → Does not require regularization N. Papadakis Wasserstein Generative Models for Texture Synthesis 29 / 1
  49. In practice • Discrete target measure ν = J j=1

    νjδyj J: size of the dataset • WGAN problem min θ OTλ c (gθ ζ, ν) = min θ max ψ∈RJ EZ∼ζ ψc,λ(gθ(Z)) + J j=1 νjψj with ψc,λ(x) = −λ log J j=1 exp ψj −c(x,yj ) λ νj . Alternate optimization - The problem is concave in ψ: averaged stochastic gradient ascent to evaluate {ψj}J j=1 - ADAM step on θ N. Papadakis Wasserstein Generative Models for Texture Synthesis 31 / 1
  50. Generation of MNIST digits λ = 0.001 λ = 0.01

    λ = 0.1 N. Papadakis Wasserstein Generative Models for Texture Synthesis 32 / 1
  51. Patch-based Texture Synthesis Patches seen as vector of Rs×s N.

    Papadakis Wasserstein Generative Models for Texture Synthesis 34 / 1
  52. Patch-based texture synthesis • Patch distribution of an image u

    µu = 1 n n i=1 δPi u where Pi is the linear operator extracting the i-th patch • Given a target image v search an image u that solves min u OTc(µu, µv ) No generator here, we just optimize pixel values of u: discrete OT N. Papadakis Wasserstein Generative Models for Texture Synthesis 35 / 1
  53. Patch-based texture synthesis • Patch distribution of an image u

    µu = 1 n n i=1 δPi u where Pi is the linear operator extracting the i-th patch • Given a target image v search an image u that solves min u OTc(µu, µv ) No generator here, we just optimize pixel values of u: discrete OT N. Papadakis Wasserstein Generative Models for Texture Synthesis 35 / 1
  54. Minimizing OT cost w.r.t. image u • Alternate optimization scheme

    on min u OTc (µu, µv ) = min u max ψ∈Rm F(ψ, u) where F(ψ, u) = 1 n n i=1 ψc(Pi u) + 1 m m j=1 ψj • At fixed u, maxψ F(ψ, u) is a concave maximization problem with bounded subgradients −→ allows for (stochastic) subgradient ascent. −→ convergence guarantee on ψ in O(log t √ t ) Alternate Optimization Initialize u0. For k = 0, . . . , K − 1 ψk ≈ argmaxψ F(ψ, uk ) (subgradient ascent) uk+1 = uk − η∇u F(ψk , uk ) (gradient descent) N. Papadakis Wasserstein Generative Models for Texture Synthesis 36 / 1
  55. Relation with iterated nearest neighbor projections Proposition Let ψ ∈

    Rm. Assume that for all i = 1, . . . , n, we can uniquely define σ(i) = argmin1 j m c(Pi u, Pj v) − ψj. Then F(ψ, ·) is differentiable at u, and ∇uF(ψ, u) = 1 n n i=1 PT i ∂x c(Pi u, Pσ(i) v) • If c(x, y) = 1 2 x − y 2 2 and η = α n s2 , the image update is uk+1 = (1 − α)uk + αvk vk = 1 s2 n i=1 PT i Pσk (i) v σk (i) = argminj 1 2 Pi uk − Pj v 2 − ψk j • [Kwatra et al. ’05] : ψ = 0 N. Papadakis Wasserstein Generative Models for Texture Synthesis 37 / 1
  56. Multi-resolution Algorithm For = 1, . . . , L,

    S u is a down-sampling of u on a grid 2 −1 Z2 min u L =1 OTc(µS u , µS v ) = min u L =1 max ψ F(ψ , S u) Algorithm 1: Multi-resolution Image Optimization Initialize u0 For k = 0, . . . , K − 1 For = 1, . . . , L • ψk ≈ argmaxψ F(ψ, S uk ) (subgradient ascent) • One step of ADAM algorithm on minu L =1 F(ψ , S u) N. Papadakis Wasserstein Generative Models for Texture Synthesis 39 / 1
  57. Results of Image Optimization Exemplar Initialization Synthesis Loss N. Papadakis

    Wasserstein Generative Models for Texture Synthesis 40 / 1
  58. Synthesis with Image Optimization (128 × 128) → (256 ×

    256) N. Papadakis Wasserstein Generative Models for Texture Synthesis 41 / 1
  59. Synthesis with Image Optimization (128 × 128) → (256 ×

    256) N. Papadakis Wasserstein Generative Models for Texture Synthesis 42 / 1
  60. Synthesis with Image Optimization (256 × 256) → (256 ×

    512) N. Papadakis Wasserstein Generative Models for Texture Synthesis 43 / 1
  61. Visual comparisons Original Our [Kwatra ’05] [Gatys ’15] Original Our

    [Kwatra ’05] [Gatys ’15] N. Papadakis Wasserstein Generative Models for Texture Synthesis 45 / 1
  62. Link with [Gatys et al. ’15] Texture synthesis from [Gatys

    et al. ’15]: min u ||Gl(u) − Gl(v)||2, where Gl are Gram matrices of VGG features at scale l Idea study the following cases Patch distributions and Gram loss → does not work Patch distribution and OT loss → our algorithm VGG feature distribution and Gram loss → [Gatys et al. ’15] VGG feature distribution and OT loss → extension of our method N. Papadakis Wasserstein Generative Models for Texture Synthesis 46 / 1
  63. Generative Model • Image optimisation: µu = 1 n n

    i=1 δPi u discrete → Discrete OT  Optimization for each new image N. Papadakis Wasserstein Generative Models for Texture Synthesis 49 / 1
  64. Generative Model • Image optimisation: µu = 1 n n

    i=1 δPi u discrete → Discrete OT  Optimization for each new image • Generative model: µθ = 1 n n i=1 (Pi ◦ gθ) ζ continuous → Semi-discrete OT  Learn a generator gθ once for all N. Papadakis Wasserstein Generative Models for Texture Synthesis 49 / 1
  65. Generative Networks Texture Networks Generate images of arbitrary size [Ulyanov

    et al., 2016] N. Papadakis Wasserstein Generative Models for Texture Synthesis 50 / 1
  66. Learn a generative network from an exemplar texture Replace •

    u by the output gθ(Z) of a convolutional neural network. • µu by the patch distribution µθ of gθ(Z). New loss function min θ L =1 max ψ E[F(ψ , S gθ(Z))]. Algorithm 2: Multi-resolution Generative Network Optimization Initialize θ For k = 0, . . . , K − 1, For = 1, . . . , L, • ψk ≈ argmaxψ E[F(ψ, S gθ (Z))] (ASGA) • Sample z ∼ ζ and take one step of ADAM algorithm on min θ L =1 F(ψ , S gθ (z)) N. Papadakis Wasserstein Generative Models for Texture Synthesis 51 / 1
  67. Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov

    et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1
  68. Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov

    et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1
  69. Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov

    et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1
  70. Synthesis with learned Generative Networks Original Algo 2 TexNet [Ulyanov

    et al. ’16] SinGAN [Shaham et al. ’19] PSGAN [Bergmann et al. ’17] Texto [Rabin et al. ’20] N. Papadakis Wasserstein Generative Models for Texture Synthesis 52 / 1
  71. Quantitative Results SIFID VGG Gram norm Muti-scale patch OT Avg

    Avg Avg Algo 1 0.43 0.02 0.08 0.71 0.31 122 6 141 865 283 0.45 0.15 0.09 0.69 0.35 Algo 2 1.13 0.06 0.18 1.82 0.80 233 19 151 922 331 0.48 0.16 0.10 0.78 0.38 TEXNET 0.11 0.08 0.18 0.17 0.14 218 9 54 190 118 0.65 0.24 0.17 1.22 0.57 SINGAN 0.93 0.10 0.17 0.37 0.39 299 8 207 394 227 0.54 0.24 0.26 0.79 0.46 PSGAN 0.27 0.91 1.14 0.49 0.70 224 512 753 1366 714 0.68 0.43 0.34 1.19 0.66 TEXTO 1.22 0.07 0.18 1.67 0.79 260 24 152 1030 367 0.49 0.16 0.11 0.75 0.38 Comparisons based on • SIFID: Single Image Fréchet Inception Distance [Shaham et al. ’19] compares responses to a trained “inception” network • VGG Gram loss [Gatys et al. ’15] compares cross-correlations of responses to the neural network VGG • Our multi-scale patch OT loss. N. Papadakis Wasserstein Generative Models for Texture Synthesis 53 / 1
  72. Conclusion • Highlight of differentiability problems in WGAN problems •

    Ensure existence of gradients in the semi-discrete case • Leads to an alternate optimization framework that can be used for some image synthesis tasks  that cannot scale (yet) to very large target measures PERSPECTIVES: • Look for regularity results for unregularized framework • Impact of entropic regularization for image synthesis problems • Exploit parameterizations of the dual variable ψ THANK YOU FOR YOUR ATTENTION N. Papadakis Wasserstein Generative Models for Texture Synthesis 54 / 1