Numerical Optimal Transport and Applications

E34ded36efe4b7abb12510d4e525fee8?s=47 Gabriel Peyré
February 04, 2016

Numerical Optimal Transport and Applications

Talk at Orsay Numerical Analysis Seminar, Feb. 4th 2016.
Update for talk a MFO, Feb. 1st 2017.

E34ded36efe4b7abb12510d4e525fee8?s=128

Gabriel Peyré

February 04, 2016
Tweet

Transcript

  1. Computational Optimal Transport and Applications Gabriel Peyré www.numerical-tours.com Joint works

    with: Jean-David Benamou, Guillaume Carlier, Lenaïc Chizat, Marco Cuturi Luca Nenna, Justin Solomon, François-Xavier Vialard É C O L E N O R M A L E S U P É R I E U R E
  2. Comparing Measures and Spaces Source image (X) Style image (Y)

    Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms
  3. Comparing Measures and Spaces Source image (X) Style image (Y)

    Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport
  4. Comparing Measures and Spaces Source image (X) Style image (Y)

    Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport for Correspondence Problems auphine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets nce Problems G. Kim esearch Suvrit Sra MIT Targets
  5. Overview • Transportation-like Problems • Regularized Transport • Barycenters •

    Gromov-Wasserstein
  6. Radon Measures and Couplings On a metric space (X, d).

    Probability measures. Positive Radon measures M+(X) : X = R d µ ( x ) = f ( x )d x µ = P i µ i xi µ(X) = 1 R f = 1 P i µi = 1
  7. Radon Measures and Couplings On a metric space (X, d).

    Probability measures. Couplings: Positive Radon measures M+(X) : P1]⇡(S) def. = ⇡(S, X) P2]⇡(S) def. = ⇡(X, S) Marginals: X = R d µ ( x ) = f ( x )d x µ = P i µ i xi ⇡ ⇧(µ, ⌫) def. = {⇡ 2 M+(X ⇥ X) ; P1]⇡ = µ, P2]⇡ = ⌫} µ(X) = 1 R f = 1 P i µi = 1 ⇡
  8. Optimal Transport Optimal transport: [Kantorovitch 1942] W p p (

    µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ )
  9. Optimal Transport Optimal transport: [Kantorovitch 1942] W p p (

    µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj
  10. Optimal Transport Optimal transport: [Kantorovitch 1942] W p p (

    µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) ⇠ O(N3) Hungarian/Auction: µ = 1 N P N i =1 xi , ⌫ = 1 N P N j =1 yj Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj
  11. Optimal Transport Optimal transport: [Kantorovitch 1942] W p p (

    µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) ⇠ O(N3) Hungarian/Auction: µ = 1 N P N i =1 xi , ⌫ = 1 N P N j =1 yj Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj µ ⌫
  12. Optimal Transport Optimal transport: [Kantorovitch 1942] Monge-Amp` ere/Benamou-Brenier, d =

    || · ||2 2. also reveal that the network simplex behaves in O ( n2 ) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applica- tions that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final trans- port cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 10−4 10−2 100 102 Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1 / 4 , ↵ = 1 / 2 , and ↵ = 3 / positive. A negative side effect of this choice is that interpolatin W p p ( µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) ⇠ O(N3) Hungarian/Auction: µ = 1 N P N i =1 xi , ⌫ = 1 N P N j =1 yj Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj µ ⌫
  13. Optimal Transport Optimal transport: [Kantorovitch 1942] Monge-Amp` ere/Benamou-Brenier, d =

    || · ||2 2. Semi-discrete: Laguerre cells, d = || · ||2 2 . also reveal that the network simplex behaves in O ( n2 ) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applica- tions that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final trans- port cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 10−4 10−2 100 102 Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1 / 4 , ↵ = 1 / 2 , and ↵ = 3 / positive. A negative side effect of this choice is that interpolatin W p p ( µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) ⇠ O(N3) Hungarian/Auction: µ = 1 N P N i =1 xi , ⌫ = 1 N P N j =1 yj Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj µ ⌫ A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRAN Figure 9. More examples of semi-discrete optimal tra Note how the solids deform and merge to form the sphere first row, and how the branches of the star split and merge second row. A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 Figure 9. More examples of semi-discrete optimal transport. Note how the solids deform and merge to form the sphere on the first row, and how the branches of the star split and merge on the second row. A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 Figure 9. More examples of semi-discrete optimal transport. Note how the solids deform and merge to form the sphere on the first row, and how the branches of the star split and merge on the second row. [Levy,’15] [M´ erigot 11]
  14. Optimal Transport Optimal transport: [Kantorovitch 1942] Need for fast approximate

    algorithms for generic c . Monge-Amp` ere/Benamou-Brenier, d = || · ||2 2. Semi-discrete: Laguerre cells, d = || · ||2 2 . also reveal that the network simplex behaves in O ( n2 ) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applica- tions that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final trans- port cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 10−4 10−2 100 102 Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1 / 4 , ↵ = 1 / 2 , and ↵ = 3 / positive. A negative side effect of this choice is that interpolatin W p p ( µ, ⌫ ) def. = min ⇡ ⇢ h d p , ⇡ i = Z X⇥X d ( x, y )pd ⇡ ( x, y ) ; ⇡ 2 ⇧( µ, ⌫ ) ⇠ O(N3) Hungarian/Auction: µ = 1 N P N i =1 xi , ⌫ = 1 N P N j =1 yj Linear programming: µ = P i µ i xi , ⌫ = P j ⌫ j yj µ ⌫ A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRAN Figure 9. More examples of semi-discrete optimal tra Note how the solids deform and merge to form the sphere first row, and how the branches of the star split and merge second row. A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 Figure 9. More examples of semi-discrete optimal transport. Note how the solids deform and merge to form the sphere on the first row, and how the branches of the star split and merge on the second row. A NUMERICAL ALGORITHM FOR L2 SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21 Figure 9. More examples of semi-discrete optimal transport. Note how the solids deform and merge to form the sphere on the first row, and how the branches of the star split and merge on the second row. [Levy,’15] [M´ erigot 11]
  15. Unbalanced Transport ( ⇠, µ ) 2 M+( X )

    2 , KL( ⇠|µ ) def. = R X log ⇣ d⇠ dµ ⌘ d µ + R X (d µ d ⇠ ) WFc(µ, ⌫) def. = min ⇡ hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫) [Liereo, Mielke, Savar´ e 2015]
  16. Unbalanced Transport [Liereo, Mielke, Savar´ e 2015] [Chizat, Schmitzer, Peyr´

    e, Vialard 2015] Proposition: then WF1/2 c is a distance on M+( X ). ( ⇠, µ ) 2 M+( X ) 2 , KL( ⇠|µ ) def. = R X log ⇣ d⇠ dµ ⌘ d µ + R X (d µ d ⇠ ) If c(x, y) = log(cos(min(d(x, y), ⇡ 2 )) WFc(µ, ⌫) def. = min ⇡ hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫) [Liereo, Mielke, Savar´ e 2015]
  17. Unbalanced Transport ! “Dynamic” Benamou-Brenier formulation. [Liereo, Mielke, Savar´ e

    2015][Kondratyev, Monsaingeon, Vorotnikov, 2015] [Liereo, Mielke, Savar´ e 2015] [Chizat, Schmitzer, Peyr´ e, Vialard 2015] Proposition: then WF1/2 c is a distance on M+( X ). [Chizat, Schmitzer, P, Vialard 2015] ( ⇠, µ ) 2 M+( X ) 2 , KL( ⇠|µ ) def. = R X log ⇣ d⇠ dµ ⌘ d µ + R X (d µ d ⇠ ) If c(x, y) = log(cos(min(d(x, y), ⇡ 2 )) WFc(µ, ⌫) def. = min ⇡ hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫) [Liereo, Mielke, Savar´ e 2015] Balanced OT Unbalanced OT
  18. Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]

    µt+1 = Prox W ⌧f ( µt) def. = argmin µ2M+(X) W2 2 ( µt, µ ) + ⌧f ( µ )
  19. Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]

    Formal limit ⌧ ! 0: @tµ = div (µr(f0(µ))) µt+1 = Prox W ⌧f ( µt) def. = argmin µ2M+(X) W2 2 ( µt, µ ) + ⌧f ( µ )
  20. Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]

    Formal limit ⌧ ! 0: @tµ = div (µr(f0(µ))) (heat di↵usion) f ( µ ) = R log( d µ d x )d µ @tµ = µ µt+1 = Prox W ⌧f ( µt) def. = argmin µ2M+(X) W2 2 ( µt, µ ) + ⌧f ( µ )
  21. Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]

    Formal limit ⌧ ! 0: @tµ = div (µr(f0(µ))) (advection) f(µ) = R wdµ @tµ = div(µrw) (heat di↵usion) f ( µ ) = R log( d µ d x )d µ @tµ = µ Evolution µt Evolution µt rw rw µt+1 = Prox W ⌧f ( µt) def. = argmin µ2M+(X) W2 2 ( µt, µ ) + ⌧f ( µ )
  22. Wasserstein Gradient Flows Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]

    Formal limit ⌧ ! 0: @tµ = div (µr(f0(µ))) (advection) f(µ) = R wdµ @tµ = div(µrw) (heat di↵usion) f ( µ ) = R log( d µ d x )d µ @tµ = µ (non-linear di↵usion) f(µ) = 1 m 1 R (d µ d x )m 1dµ @tµ = µm Evolution µt Evolution µt rw rw µt+1 = Prox W ⌧f ( µt) def. = argmin µ2M+(X) W2 2 ( µt, µ ) + ⌧f ( µ )
  23. Overview • Transportation-like Problems • Regularized Transport • Barycenters •

    Gromov-Wasserstein
  24. Transportation-like Problems Unbalanced transport: Optimal transport: Gradient flows: min ⇡2M+(X⇥X)

    E(⇡) def. = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡) f1 = ◆µ f2 = ◆⌫ f1 = KL(·|µ) f2 = KL(·|⌫) f1 = ◆µt f2 = f
  25. Transportation-like Problems Regularization and positivity barrier. Discretization grid (prescribed support).

    Unbalanced transport: Optimal transport: Gradient flows: min ⇡2M+(X⇥X) E(⇡) def. = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡) f1 = ◆µ f2 = ◆⌫ f1 = KL(·|µ) f2 = KL(·|⌫) f1 = ◆µt f2 = f Regularization: min ⇡ E(⇡) + "KL(⇡|⇡0) "KL(⇡|⇡0)
  26. Transportation-like Problems Regularization and positivity barrier. Discretization grid (prescribed support).

    ⇡0 argmin E(⇡) ⇡1 = Prox1 " E ( ⇡0) ⇡2 = Prox1 " E ( ⇡1) Unbalanced transport: Optimal transport: Gradient flows: min ⇡2M+(X⇥X) E(⇡) def. = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡) f1 = ◆µ f2 = ◆⌫ f1 = KL(·|µ) f2 = KL(·|⌫) f1 = ◆µt f2 = f Regularization: min ⇡ E(⇡) + "KL(⇡|⇡0) Prox KL E/"( ⇡0) def. = arg "KL(⇡|⇡0) Implicit KL stepping.
  27. Entropy Regularized Transport Schr¨ odinger’s problem: Landmark computational paper: [Cuturi

    2013]. ⇡" def. = argmin ⇡ {hdp, ⇡i + "KL(⇡|⇡0) ; ⇡ 2 ⇧(µ, ⌫)} ⇡" = argmin ⇡2⇧(µ,⌫) KL(⇡|K) K ( x, y ) def. = e dp( x,y ) " ⇡0( x, y )
  28. Entropy Regularized Transport Schr¨ odinger’s problem: Landmark computational paper: [Cuturi

    2013]. ⇡" def. = argmin ⇡ {hdp, ⇡i + "KL(⇡|⇡0) ; ⇡ 2 ⇧(µ, ⌫)} ⇡" = argmin ⇡2⇧(µ,⌫) KL(⇡|K) K ( x, y ) def. = e dp( x,y ) " ⇡0( x, y ) Proposition: [Carlier, Duval, Peyr´ e, Schmitzer 2015] ⇡" "!0 ! argmin ⇡2⇧(µ,⌫) hdp, ⇡i ⇡" "!+1 ! µ ( x ) ⌫ ( y ) " ⇡" "
  29. Dykstra-like Iterations Primal: min ⇡ hdp, ⇡i + f1(P1]⇡) +

    f2(P2]⇡) + "KL(⇡|⇡0)
  30. Dykstra-like Iterations Primal: min ⇡ hdp, ⇡i + f1(P1]⇡) +

    f2(P2]⇡) + "KL(⇡|⇡0) Dual: (a, b) def. = (e u " , e v " ) max u,v f⇤ 1 ( u ) f⇤ 2 ( v ) "he u " , Ke v " i ⇡ ( x, y ) = a ( x ) K ( x, y ) b ( y )
  31. Dykstra-like Iterations Primal: min ⇡ hdp, ⇡i + f1(P1]⇡) +

    f2(P2]⇡) + "KL(⇡|⇡0) Block coordinates relaxation: max u f⇤ 1 ( u ) "he u " , Ke v " i max v f⇤ 2 ( v ) "he v " , K⇤e u " i (Iv) (Iu) Dual: (a, b) def. = (e u " , e v " ) max u,v f⇤ 1 ( u ) f⇤ 2 ( v ) "he u " , Ke v " i ⇡ ( x, y ) = a ( x ) K ( x, y ) b ( y )
  32. Dykstra-like Iterations Primal: min ⇡ hdp, ⇡i + f1(P1]⇡) +

    f2(P2]⇡) + "KL(⇡|⇡0) Block coordinates relaxation: max u f⇤ 1 ( u ) "he u " , Ke v " i max v f⇤ 2 ( v ) "he v " , K⇤e u " i (Iv) (Iu) Proposition: Prox KL f1/"( µ ) def. = argmin ⌫f1( ⌫ ) + " KL( ⌫|µ ) the solutions of ( Iu) and ( Iv) read: a = Prox KL f1/"( Kb ) Kb b = Prox KL f2/"( K⇤a ) K⇤a Dual: (a, b) def. = (e u " , e v " ) max u,v f⇤ 1 ( u ) f⇤ 2 ( v ) "he u " , Ke v " i ⇡ ( x, y ) = a ( x ) K ( x, y ) b ( y )
  33. Dykstra-like Iterations Primal: ! Only matrix-vector multiplications. ! Highly parallelizable.

    ! On regular grids: only convolutions! Linear time iterations. min ⇡ hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0) Block coordinates relaxation: max u f⇤ 1 ( u ) "he u " , Ke v " i max v f⇤ 2 ( v ) "he v " , K⇤e u " i (Iv) (Iu) Proposition: Prox KL f1/"( µ ) def. = argmin ⌫f1( ⌫ ) + " KL( ⌫|µ ) the solutions of ( Iu) and ( Iv) read: a = Prox KL f1/"( Kb ) Kb b = Prox KL f2/"( K⇤a ) K⇤a Dual: (a, b) def. = (e u " , e v " ) max u,v f⇤ 1 ( u ) f⇤ 2 ( v ) "he u " , Ke v " i ⇡ ( x, y ) = a ( x ) K ( x, y ) b ( y )
  34. Sinkhorn’s Algorithm Prox KL f1/"(˜ µ ) = µ Prox

    KL f2/"(˜ ⌫ ) = ⌫ f1 = ◆µ f2 = ◆⌫ Optimal transport problem:
  35. Sinkhorn’s Algorithm Prox KL f1/"(˜ µ ) = µ Prox

    KL f2/"(˜ ⌫ ) = ⌫ ` ⇡(`) " [Sinkhorn 1967] [Deming,Stephan 1940] Sinkhorn/IPFP algorithm: a(`+1) def. = µ Kb(`) and b(`+1) def. = ⌫ K⇤a(`+1) f1 = ◆µ f2 = ◆⌫ Optimal transport problem:
  36. Sinkhorn’s Algorithm Prox KL f1/"(˜ µ ) = µ Prox

    KL f2/"(˜ ⌫ ) = ⌫ ⇡" " ` ⇡(`) " [Sinkhorn 1967] [Deming,Stephan 1940] Sinkhorn/IPFP algorithm: a(`+1) def. = µ Kb(`) and b(`+1) def. = ⌫ K⇤a(`+1) f1 = ◆µ f2 = ◆⌫ Optimal transport problem: Proposition: [Franklin,Lorenz 1989] ⇡(`) def. = diag(a(`))K diag(b(`)) || log( ⇡(`) ) log( ⇡? ) ||1 = O (1 ) `, ⇠  1/" c Local rate: [Knight 2008] 1000 2000 3000 4000 5000 -2 -1.5 -1 -0.5 0 ` log( || log( ⇡(`) ) log( ⇡? ) ||1)
  37. Gradient Flows: Crowd Motion µt+1 def. = argminµ W↵ ↵

    (µt, µ) + ⌧f(µ) Congestion-inducing function: f(µ) = ◆[0,] (µ) + hw, µi [Maury, Roudne↵-Chupin, Santambrogio 2010]
  38. Gradient Flows: Crowd Motion µt+1 def. = argminµ W↵ ↵

    (µt, µ) + ⌧f(µ) Proposition: Prox1 " f ( µ ) = min( e "wµ,  ) Congestion-inducing function: f(µ) = ◆[0,] (µ) + hw, µi [Maury, Roudne↵-Chupin, Santambrogio 2010]
  39. Gradient Flows: Crowd Motion µt+1 def. = argminµ W↵ ↵

    (µt, µ) + ⌧f(µ) Proposition: Prox1 " f ( µ ) = min( e "wµ,  )  = ||µt=0 ||1  = 2||µt=0 ||1  = 4||µt=0 ||1 Congestion-inducing function: f(µ) = ◆[0,] (µ) + hw, µi [Maury, Roudne↵-Chupin, Santambrogio 2010] rw
  40. Crowd Motion on a Surface Potential cos( w ) 

    = ||µt=0 ||1  = 6||µt=0 ||1 X = triangulated mesh.
  41. Crowd Motion on a Surface Potential cos( w ) 

    = ||µt=0 ||1  = 6||µt=0 ||1 X = triangulated mesh.
  42. Gradient Flows: Crowd Motion with Obstacles Potential cos( w )

     = ||µt=0 ||1  = 2||µt=0 ||1  = 4||µt=0 ||1  = 6||µt=0 ||1 X = sub-domain of R2 .
  43. Gradient Flows: Crowd Motion with Obstacles Potential cos( w )

     = ||µt=0 ||1  = 2||µt=0 ||1  = 4||µt=0 ||1  = 6||µt=0 ||1 X = sub-domain of R2 .
  44. Multiple-Density Gradient Flows (µ1,t+1, µ2,t+1) def. = argmin (µ1,µ2) W↵

    ↵ (µ1,t, µ1) + W↵ ↵ (µ2,t, µ2) + ⌧f(µ1, µ2)
  45. Multiple-Density Gradient Flows Wasserstein attraction: rw (µ1,t+1, µ2,t+1) def. =

    argmin (µ1,µ2) W↵ ↵ (µ1,t, µ1) + W↵ ↵ (µ2,t, µ2) + ⌧f(µ1, µ2) f(µ1, µ2) = W↵ ↵ (µ1, µ2) + h1(µ1) + h2(µ2) hi(µ) = hw, µi
  46. Multiple-Density Gradient Flows Wasserstein attraction: rw (µ1,t+1, µ2,t+1) def. =

    argmin (µ1,µ2) W↵ ↵ (µ1,t, µ1) + W↵ ↵ (µ2,t, µ2) + ⌧f(µ1, µ2) f(µ1, µ2) = W↵ ↵ (µ1, µ2) + h1(µ1) + h2(µ2) hi(µ) = ◆[0,] (µ). hi(µ) = hw, µi
  47. Overview • Transportation-like Problems • Regularized Transport • Barycenters •

    Gromov-Wasserstein
  48. Wasserstein Barycenters Barycenters of measures ( µk)k: P k k

    = 1 µ? 2 argmin µ P k kW2 2 (µk, µ)
  49. Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )

    W 2 (µ 2 ,µ ) µ2 W2 (µ3 ,µ ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: µ? 2 argmin µ P k kW2 2 (µk, µ)
  50. Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )

    W 2 (µ 2 ,µ ) µ2 W2 (µ3 ,µ ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: µ? 2 argmin µ P k kW2 2 (µk, µ)
  51. Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )

    W 2 (µ 2 ,µ ) µ2 W2 (µ3 ,µ ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: µ2 µ1 Mc Cann’s displacement interpolation. µ? 2 argmin µ P k kW2 2 (µk, µ)
  52. Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )

    W 2 (µ 2 ,µ ) µ2 W2 (µ3 ,µ ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: µ2 µ1 µ3 µ2 µ1 Mc Cann’s displacement interpolation. µ? 2 argmin µ P k kW2 2 (µk, µ)
  53. Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )

    W 2 (µ 2 ,µ ) µ2 W2 (µ3 ,µ ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: µ2 µ1 µ3 µ exists and is unique. Theorem: if µ1 does not vanish on small sets, [Agueh, Carlier, 2010] (for c(x, y) = || x y ||2 ) µ2 µ1 Mc Cann’s displacement interpolation. µ? 2 argmin µ P k kW2 2 (µk, µ)
  54. Regularized Barycenters min (⇡k)k,µ ( X k k (hc, ⇡k

    i + "KL(⇡k |⇡0,k)) ; 8k, ⇡k 2 ⇧(µk, µ) )
  55. Regularized Barycenters ! Need to fix a discretization grid for

    µ , i.e. choose ( ⇡0,k)k min (⇡k)k,µ ( X k k (hc, ⇡k i + "KL(⇡k |⇡0,k)) ; 8k, ⇡k 2 ⇧(µk, µ) )
  56. Regularized Barycenters ! Need to fix a discretization grid for

    µ , i.e. choose ( ⇡0,k)k ! Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyr´ e, 2015]. min (⇡k)k,µ ( X k k (hc, ⇡k i + "KL(⇡k |⇡0,k)) ; 8k, ⇡k 2 ⇧(µk, µ) )
  57. Regularized Barycenters ! Need to fix a discretization grid for

    µ , i.e. choose ( ⇡0,k)k ! Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyr´ e, 2015]. [Solomon et al, SIGGRAPH 2015] min (⇡k)k,µ ( X k k (hc, ⇡k i + "KL(⇡k |⇡0,k)) ; 8k, ⇡k 2 ⇧(µk, µ) )
  58. Barycenter on a Surface 1 µ1 µ2 0 1

  59. Barycenter on a Surface 1 µ1 µ2 µ3 µ4 µ5

    µ6 µ1 µ2 0 1
  60. Barycenter on a Surface 1 = (1, . . .

    , 1)/6 µ1 µ2 µ3 µ4 µ5 µ6 µ1 µ2 0 1
  61. Color Transfer µ ⌫ Input images: ( f, g )

    (chrominance components) Input measures: f g µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))
  62. Color Transfer µ ⌫ Input images: ( f, g )

    (chrominance components) Input measures: f g µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))
  63. Color Transfer µ ⌫ Input images: ( f, g )

    (chrominance components) Input measures: f T T f g ˜ T g µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))
  64. Overview • Transportation-like Problems • Regularized Transport • Barycenters •

    Gromov-Wasserstein
  65. X Y Gromov-Wasserstein Metric-measure spaces (X, Y ): (dX ,

    µ), (dY , ⌫)
  66. X Y Gromov-Wasserstein dX ( x,x 0 ) dY (y,

    y0) Def. Gromov-Wasserstein distance: [Memoli 2011] [Sturm 2012] GW2 2 (dX , µ, dY , ⌫) def. = min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x, x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)
  67. X Y Gromov-Wasserstein f X Y Prop. GW is a

    distance on mm-spaces/isometries. ! “bending-invariant” objects recognition. dX ( x,x 0 ) dY (y, y0) Def. Gromov-Wasserstein distance: [Memoli 2011] [Sturm 2012] GW2 2 (dX , µ, dY , ⌫) def. = min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x, x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)
  68. X Y Gromov-Wasserstein f X Y Prop. GW is a

    distance on mm-spaces/isometries. ! need for a fast approximate solver. ! “bending-invariant” objects recognition. ! QAP: NP-hard in general. dX ( x,x 0 ) dY (y, y0) Def. Gromov-Wasserstein distance: [Memoli 2011] [Sturm 2012] GW2 2 (dX , µ, dY , ⌫) def. = min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x, x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)
  69. Entropic Gromov-Wasserstein min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x,

    x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) + " KL( ⇡ | ⇡0) Entropic regularization: (?)
  70. Entropic Gromov-Wasserstein min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x,

    x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) + " KL( ⇡ | ⇡0) DX def. = ( dX ( xi, x 0 i ))i,i0 DY def. = (dY (yj, y0 j ))j,j0 Entropic regularization: Discrete spaces: min ⇡2⇧(µ,⌫) KL(⇡|K⇡) (?) , (?) K⇡ def. = e2 " DX ⇥⇡⇥DY
  71. Entropic Gromov-Wasserstein min ⇡2⇧(µ,⌫) Z X2⇥Y 2 | dX( x,

    x 0) dY ( y, y 0)|2d ⇡ ( x, y )d ⇡ ( x 0 , y 0) + " KL( ⇡ | ⇡0) DX def. = ( dX ( xi, x 0 i ))i,i0 DY def. = (dY (yj, y0 j ))j,j0 Fixed point iteration: ⇡`+1 def. = min ⇡2⇧(µ,⌫) KL( ⇡|K⇡` ) = Sinkhorn( K⇡` ) Entropic regularization: Discrete spaces: min ⇡2⇧(µ,⌫) KL(⇡|K⇡) (?) , (?) Prop. ⇡` converges to a stationary point of ( ? ). K⇡ def. = e2 " DX ⇥⇡⇥DY
  72. Applications of GW: Shapes Analysis phine Vladimir G. Kim Adobe

    Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Use T to define registration between: Colors distribution Shape Shape Shape
  73. Applications of GW: Shapes Analysis phine Vladimir G. Kim Adobe

    Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] 0 0.02 0.04 0 0.02 0 0.02 Te Hu Fo Ar Figure 1: The database that has been used, divide MDS in 3-D Use T to define registration between: Colors distribution Shape Shape Shape Geodesic distances GW distances MDS Vizualization Shapes (Xs)s 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 2 Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: MDS in 2-D
  74. Applications of GW: Quantum Chemistry Regression problem: ! f by

    solving DFT approximation is too costly.
  75. Applications of GW: Quantum Chemistry Regression problem: ! f by

    solving DFT approximation is too costly. [Rupp et al 2012] 2 6 6 6 6 4 3 7 7 7 7 5
  76. Applications of GW: Quantum Chemistry Regression problem: ! f by

    solving DFT approximation is too costly. GW-interpolation: [Rupp et al 2012] 2 6 6 6 6 4 3 7 7 7 7 5
  77. xi yj Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Applications:

    – quantum chemistry – shapes Entropic Metric Alignment for Corresponde Justin Solomon∗ MIT Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimi Adobe R Abstract Many shape and image processing tools rely on computation of cor- respondences between geometric domains. Efficient methods that stably extract “soft” matches in the presence of diverse geometric structures have proven to be valuable for shape retrieval and transfer of labels or semantic information. With these applications in mind, we present an algorithm for probabilistic correspondence that opti- mizes an entropy-regularized Gromov-Wasserstein (GW) objective. Built upon recent developments in numerical optimal transportation, our algorithm is compact, provably convergent, and applicable to any geometric domain expressible as a metric measure matrix. We Source Figure 1: Entropic G surface (left) and a s shared semantic stru ic Metric Alignment for Correspondence Problems n∗ Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimir G. Kim Adobe Research Suvrit Sra MIT ng tools rely on computation of cor- c domains. Efficient methods that the presence of diverse geometric able for shape retrieval and transfer n. With these applications in mind, babilistic correspondence that opti- omov-Wasserstein (GW) objective. n numerical optimal transportation, ably convergent, and applicable to le as a metric measure matrix. We Source Targets Figure 1: Entropic GW can find correspondences between surface (left) and a surface with similar structure, a surf shared semantic structure, a noisy 3D point cloud, an ico Conclusion
  78. xi yj Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Applications:

    – quantum chemistry – shapes Entropic Metric Alignment for Corresponde Justin Solomon∗ MIT Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimi Adobe R Abstract Many shape and image processing tools rely on computation of cor- respondences between geometric domains. Efficient methods that stably extract “soft” matches in the presence of diverse geometric structures have proven to be valuable for shape retrieval and transfer of labels or semantic information. With these applications in mind, we present an algorithm for probabilistic correspondence that opti- mizes an entropy-regularized Gromov-Wasserstein (GW) objective. Built upon recent developments in numerical optimal transportation, our algorithm is compact, provably convergent, and applicable to any geometric domain expressible as a metric measure matrix. We Source Figure 1: Entropic G surface (left) and a s shared semantic stru ic Metric Alignment for Correspondence Problems n∗ Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimir G. Kim Adobe Research Suvrit Sra MIT ng tools rely on computation of cor- c domains. Efficient methods that the presence of diverse geometric able for shape retrieval and transfer n. With these applications in mind, babilistic correspondence that opti- omov-Wasserstein (GW) objective. n numerical optimal transportation, ably convergent, and applicable to le as a metric measure matrix. We Source Targets Figure 1: Entropic GW can find correspondences between surface (left) and a surface with similar structure, a surf shared semantic structure, a noisy 3D point cloud, an ico Conclusion
  79. xi yj Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy:

    makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) Applications: – quantum chemistry – shapes Entropic Metric Alignment for Corresponde Justin Solomon∗ MIT Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimi Adobe R Abstract Many shape and image processing tools rely on computation of cor- respondences between geometric domains. Efficient methods that stably extract “soft” matches in the presence of diverse geometric structures have proven to be valuable for shape retrieval and transfer of labels or semantic information. With these applications in mind, we present an algorithm for probabilistic correspondence that opti- mizes an entropy-regularized Gromov-Wasserstein (GW) objective. Built upon recent developments in numerical optimal transportation, our algorithm is compact, provably convergent, and applicable to any geometric domain expressible as a metric measure matrix. We Source Figure 1: Entropic G surface (left) and a s shared semantic stru ic Metric Alignment for Correspondence Problems n∗ Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimir G. Kim Adobe Research Suvrit Sra MIT ng tools rely on computation of cor- c domains. Efficient methods that the presence of diverse geometric able for shape retrieval and transfer n. With these applications in mind, babilistic correspondence that opti- omov-Wasserstein (GW) objective. n numerical optimal transportation, ably convergent, and applicable to le as a metric measure matrix. We Source Targets Figure 1: Entropic GW can find correspondences between surface (left) and a surface with similar structure, a surf shared semantic structure, a noisy 3D point cloud, an ico Conclusion
  80. xi yj Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy:

    makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) Applications: – quantum chemistry – shapes Entropic Metric Alignment for Corresponde Justin Solomon∗ MIT Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimi Adobe R Abstract Many shape and image processing tools rely on computation of cor- respondences between geometric domains. Efficient methods that stably extract “soft” matches in the presence of diverse geometric structures have proven to be valuable for shape retrieval and transfer of labels or semantic information. With these applications in mind, we present an algorithm for probabilistic correspondence that opti- mizes an entropy-regularized Gromov-Wasserstein (GW) objective. Built upon recent developments in numerical optimal transportation, our algorithm is compact, provably convergent, and applicable to any geometric domain expressible as a metric measure matrix. We Source Figure 1: Entropic G surface (left) and a s shared semantic stru ic Metric Alignment for Correspondence Problems n∗ Gabriel Peyr´ e CNRS & Univ. Paris-Dauphine Vladimir G. Kim Adobe Research Suvrit Sra MIT ng tools rely on computation of cor- c domains. Efficient methods that the presence of diverse geometric able for shape retrieval and transfer n. With these applications in mind, babilistic correspondence that opti- omov-Wasserstein (GW) objective. n numerical optimal transportation, ably convergent, and applicable to le as a metric measure matrix. We Source Targets Figure 1: Entropic GW can find correspondences between surface (left) and a surface with similar structure, a surf shared semantic structure, a noisy 3D point cloud, an ico Future works: Conclusion