Slide 1

Slide 1 text

From Monge to Gromov-Wasserstein Optimal Transport and Barycenters Between Several Metric Spaces Gabriel Peyré Marco Cuturi Justin Solomon É C O L E N O R M A L E S U P É R I E U R E RESEARCH UNIVERSITY PARIS

Slide 2

Slide 2 text

Comparing Measures and Spaces Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms

Slide 3

Slide 3 text

Comparing Measures and Spaces Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport

Slide 4

Slide 4 text

Comparing Measures and Spaces Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport for Correspondence Problems auphine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets nce Problems G. Kim search Suvrit Sra MIT Targets

Slide 5

Slide 5 text

1. Entropy Regularized Gromov-Wasserstein 2. Gromov-Wasserstein Barycenters 0. Entropy Regularized Optimal Transport

Slide 6

Slide 6 text

xi yj Couplings and Optimal Transport (EMD) Points (xi)i, (yj)j T Input distributions Couplings Def.

Slide 7

Slide 7 text

xi yj Couplings and Optimal Transport (EMD) Points (xi)i, (yj)j T Input distributions Couplings Def. [Kantorovich 1942] Wasserstein Distance / EMD Def.

Slide 8

Slide 8 text

1.3. Notation The simplex of histograms with N bins is ⌃N def. = p 2 R+ N ; P i pi = 1 . The entropy of T 2 RN⇥N + is defined as H ( T ) def. = PN i,j =1 Ti,j (log( Ti,j ) 1) . The set of couplings between histograms p 2 ⌃N1 and q 2 ⌃N2 is Cp,q def. = T 2 ( R +) N1 ⇥N2 ; T1N2 = p, T>1N1 = q . Here, 1N def. = (1 , . . . , 1) > 2 RN . For any tensor L = ( Li,j,k,` )i,j,k,` and matrix ( Ti,j )i,j, we define the tensor- matrix multiplication as L ⌦ T def. = ⇣ X k,` Li,j,k,`Tk,` ⌘ i,j . (1) 2. Gromov-Wasserstein Discrepancy 2.1. Entropic Optimal Transport Optimal transport distances are useful to compare two his- tograms ( p, q ) 2 ⌃N1 ⇥ ⌃N2 defined on the same metric 1 https://github.com/gpeyre/ In our setting, since we learning problems, we d distance matrices, i.e., th does not necessarily sati We define the Gromov- two measured similarity and ( ¯ C, q ) 2 RN2 ⇥N2 ⇥ GW( C, ¯ C, p where EC, ¯ C( T ) The matrix T is a cou which the similarity ma L is some loss function the similarity matrices. the quadratic loss L ( a, the Kullback-Leibler di a log( a/b ) a + b (wh inition (4) of GW ext by (M´ emoli, 2011), sin Entropy: Entropy and Sinkhorn Regularization impact on solution: Def. Regularized OT: [Cuturi NIPS’13] c " T" "

Slide 9

Slide 9 text

1.3. Notation The simplex of histograms with N bins is ⌃N def. = p 2 R+ N ; P i pi = 1 . The entropy of T 2 RN⇥N + is defined as H ( T ) def. = PN i,j =1 Ti,j (log( Ti,j ) 1) . The set of couplings between histograms p 2 ⌃N1 and q 2 ⌃N2 is Cp,q def. = T 2 ( R +) N1 ⇥N2 ; T1N2 = p, T>1N1 = q . Here, 1N def. = (1 , . . . , 1) > 2 RN . For any tensor L = ( Li,j,k,` )i,j,k,` and matrix ( Ti,j )i,j, we define the tensor- matrix multiplication as L ⌦ T def. = ⇣ X k,` Li,j,k,`Tk,` ⌘ i,j . (1) 2. Gromov-Wasserstein Discrepancy 2.1. Entropic Optimal Transport Optimal transport distances are useful to compare two his- tograms ( p, q ) 2 ⌃N1 ⇥ ⌃N2 defined on the same metric 1 https://github.com/gpeyre/ In our setting, since we learning problems, we d distance matrices, i.e., th does not necessarily sati We define the Gromov- two measured similarity and ( ¯ C, q ) 2 RN2 ⇥N2 ⇥ GW( C, ¯ C, p where EC, ¯ C( T ) The matrix T is a cou which the similarity ma L is some loss function the similarity matrices. the quadratic loss L ( a, the Kullback-Leibler di a log( a/b ) a + b (wh inition (4) of GW ext by (M´ emoli, 2011), sin Entropy: Entropy and Sinkhorn Regularization impact on solution: Def. Regularized OT: [Cuturi NIPS’13] c " T" " repeat: until convergence. initialization: (a, b) (1N1 , 1N2 ) return T = diag(a)K diag(b) Fixed point algorithm: Only matrix/vector multiplications. ! Parallelizable. ! Streams well on GPU.

Slide 10

Slide 10 text

[Solomon et al, SIGGRAPH 2015] Generalizations

Slide 11

Slide 11 text

[Solomon et al, SIGGRAPH 2015] Generalizations [Liereo, Mielke, Savar´ e 2015] [Chizat, Schmitzer, Peyr´ e, Vialard 2015]

Slide 12

Slide 12 text

[Solomon et al, SIGGRAPH 2015] Generalizations [Liereo, Mielke, Savar´ e 2015] [Chizat, Schmitzer, Peyr´ e, Vialard 2015]

Slide 13

Slide 13 text

2. Gromov-Wasserstein Barycenters 1. Entropy Regularized Gromov-Wasserstein 0. Entropy Regularized Optimal Transport

Slide 14

Slide 14 text

unregistered spaces Gromov-Wasserstein Inputs: { (similarity/kernel matrix, histogram) } X Y

Slide 15

Slide 15 text

unregistered spaces Gromov-Wasserstein Inputs: { (similarity/kernel matrix, histogram) } X Y [Memoli 2011] Def. Gromov-Wasserstein distance:

Slide 16

Slide 16 text

unregistered spaces Gromov-Wasserstein Inputs: { (similarity/kernel matrix, histogram) } X Y [Memoli 2011] Def. Gromov-Wasserstein distance: ! need for a fast approximate solver. ! NP-hard in general.

Slide 17

Slide 17 text

Y xi X Gromov-Wasserstein as a Metric yj Def.

Slide 18

Slide 18 text

Y xi X Gromov-Wasserstein as a Metric yj X Y f f X Y Def. () Isometries on M : Def.

Slide 19

Slide 19 text

Y xi X Gromov-Wasserstein as a Metric yj ! “bending-invariant” objects recognition. X Y f f X Y Def. () Isometries on M : Def. Prop. [Memoli 2011]

Slide 20

Slide 20 text

Entropic Gromov Wasserstein Projected mirror descent: Def. Entropic Gromov-Wasserstein

Slide 21

Slide 21 text

Entropic Gromov Wasserstein Projected mirror descent: Def. Entropic Gromov-Wasserstein Projected mirror descent: Def.

Slide 22

Slide 22 text

Entropic Gromov Wasserstein Prop. T converges to a stationary point for ⌧ small enough Projected mirror descent: Def. Entropic Gromov-Wasserstein Projected mirror descent: Def. iterations

Slide 23

Slide 23 text

Sinkhorn and Entropic Gromov Wasserstein Projected mirror descent:

Slide 24

Slide 24 text

Sinkhorn and Entropic Gromov Wasserstein Projected mirror descent: Prop.

Slide 25

Slide 25 text

Sinkhorn and Entropic Gromov Wasserstein Projected mirror descent: Q-Softassign [Gold, Rangarajan 1996] Prop. for ⌧ = 1 /" , the iteration reads Prop. repeat: until convergence. initialization: return T func T = GW(C, ¯ C, p, q)

Slide 26

Slide 26 text

phine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis Use T to define registration between: Colors distribution Shape Shape Shape

Slide 27

Slide 27 text

phine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis 0 0.02 0.04 0 0.02 0 0.02 Te Hu Fo Ar Figure 1: The database that has been used, divide MDS in 3-D Use T to define registration between: Colors distribution Shape Shape Shape Geodesic distances GW distances MDS Vizualization Shapes (Xs)s 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 2 Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: MDS in 2-D

Slide 28

Slide 28 text

Applications of GW: Quantum Chemistry Regression problem: ! f by solving DFT approximation is too costly.

Slide 29

Slide 29 text

Applications of GW: Quantum Chemistry Regression problem: ! f by solving DFT approximation is too costly. [Rupp et al 2012] 2 6 6 6 6 4 3 7 7 7 7 5

Slide 30

Slide 30 text

Applications of GW: Quantum Chemistry Regression problem: ! f by solving DFT approximation is too costly. GW-interpolation: [Rupp et al 2012] 2 6 6 6 6 4 3 7 7 7 7 5

Slide 31

Slide 31 text

1. Entropy Regularized Gromov-Wasserstein 2. Gromov-Wasserstein Barycenters 0. Entropy Regularized Optimal Transport

Slide 32

Slide 32 text

Gromov-Wasserstein Geodesics Def. Gromov-Wasserstein Geodesic

Slide 33

Slide 33 text

Gromov-Wasserstein Geodesics Def. Gromov-Wasserstein Geodesic Prop. [Sturm 2012]

Slide 34

Slide 34 text

Gromov-Wasserstein Geodesics Def. Gromov-Wasserstein Geodesic ! X ⇥ Y is not practical for most applications. (need to fix the size of the geodesic embedding space) ! Extension to more than 2 input spaces? Prop. [Sturm 2012]

Slide 35

Slide 35 text

Gromov-Wasserstein Barycenters Input: 1 2 3 Def. GW Barycenters

Slide 36

Slide 36 text

Gromov-Wasserstein Barycenters Input: 1 2 3 Def. GW Barycenters repeat: until convergence. initialization: C C0 for s = 1 to S do return C Alternating minimization: On Ts

Slide 37

Slide 37 text

t 0 1 GW Barycenters of Shapes Shapes (Xs)s Euclidean Distances q K>a . ce ag( a ) K diag( b ) . ing (13). s T > s h2( Cs) Ts pp> ⌘ mula (15), one has to show that UsTs diag(1 /p ) is conditionally s matrices are. This is indeed x such that hx, 1N i = 0 , one sxs, xsi where xs def. = Ts x p , and > s 1Ns i = hx p , pi = 0 , one has Ux, xi > 0 , which proves the ince the cone of infinitely divisi- that the output of our barycenter s infinitely divisible. finitely divisible kernels is given input data clouds are shown on the left, and an MDS embedding of the barycenter distance matrix is shown on the right. Figure 2. Barycenter example for shape data from (Thakoor et al., 2007). 4. Experiments 4.1. Point Clouds Embedded barycenters. Figure 1 provides an example illustrating the behavior of our GW barycenter approxima- tion. In this experiment, we extract 500 point clouds of handwritten digits from the dataset (LeCun et al., 1998), rotated arbitrarily in the plane. We represent each digit as a symmetric Euclidean distance matrix and optimize for a 500 ⇥ 500 barycenter using Algorithm 1 (uniform weights, " = 1 ⇥ 10 3); notice that most of the input point clouds consist of fewer than 500 points. We then visualize the

Slide 38

Slide 38 text

GW Barycenters of Shapes Geod. Eucl. t 0 1

Slide 39

Slide 39 text

xi yj Conclusion Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein

Slide 40

Slide 40 text

xi yj Conclusion Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy: makes the problem tractable

Slide 41

Slide 41 text

xi yj Conclusion Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy: makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex)

Slide 42

Slide 42 text

xi yj Conclusion Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy: makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) 0 0.02 0.04 0 0.02 0 0.02 Figure 15: MDS embedding of four classes from 0 0.5 0 0.5 1 1 10 15 20 25 35 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse s 0 is the base shape) as a feature vector for shape the result presented in the work of Rustamov e the circular structure of meshes from a galloping sequence (Figure 16). Unlike Rustamov et al., how does not require ground truth maps between shape Figure 1: The database that has been used, divided i 3 Participants Each participant was asked to submit up to 3 runs of his/her algorith dissimilarity matrices; each run could be for example the result of a d or the use of a different similarity metric. We remind that the entry ( represent the distance between models i and j. Figure 1: The database that has been used, divi 3 Participants Each participant was asked to submit up to 3 runs of his/her al dissimilarity matrices; each run could be for example the result o or the use of a different similarity metric. We remind that the en represent the distance between models i and j. This track saw 5 groups of participants: 1. Ceyhun Burak Akgül, Francis Schmitt, Bülent Sankur and Y 2. Mohamed Chaouch and Anne Verroust-Blondet with 2 matr 3. Thibault Napoléon, Tomasz Adamek, Francis Schmitt and N 4. Petros Daras and Athanasios Mademlis sent 1 matrix; 5. Tony Tung and Francis Schmitt with 3 matrices. Figure 1: The database that has been use 3 Participants Each participant was asked to submit up to 3 runs of his/ dissimilarity matrices; each run could be for example the re Applications: – unregistered data – quantum chemistry – shapes 2 6 6 6 6 4 3 7 7 7 7 5 [Rupp et al 2012]

Slide 43

Slide 43 text

xi yj Conclusion Optimal transport Registered spaces Un-registered spaces Gromov-Wasserstein Entropy: makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) 0 0.02 0.04 0 0.02 0 0.02 Figure 15: MDS embedding of four classes from 0 0.5 0 0.5 1 1 10 15 20 25 35 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse s 0 is the base shape) as a feature vector for shape the result presented in the work of Rustamov e the circular structure of meshes from a galloping sequence (Figure 16). Unlike Rustamov et al., how does not require ground truth maps between shape Figure 1: The database that has been used, divided i 3 Participants Each participant was asked to submit up to 3 runs of his/her algorith dissimilarity matrices; each run could be for example the result of a d or the use of a different similarity metric. We remind that the entry ( represent the distance between models i and j. Figure 1: The database that has been used, divi 3 Participants Each participant was asked to submit up to 3 runs of his/her al dissimilarity matrices; each run could be for example the result o or the use of a different similarity metric. We remind that the en represent the distance between models i and j. This track saw 5 groups of participants: 1. Ceyhun Burak Akgül, Francis Schmitt, Bülent Sankur and Y 2. Mohamed Chaouch and Anne Verroust-Blondet with 2 matr 3. Thibault Napoléon, Tomasz Adamek, Francis Schmitt and N 4. Petros Daras and Athanasios Mademlis sent 1 matrix; 5. Tony Tung and Francis Schmitt with 3 matrices. Figure 1: The database that has been use 3 Participants Each participant was asked to submit up to 3 runs of his/ dissimilarity matrices; each run could be for example the re Applications: – unregistered data – quantum chemistry – shapes Theoretical analysis of entropic GW. Large scale applications. Future works: 2 6 6 6 6 4 3 7 7 7 7 5 [Rupp et al 2012]