Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms
Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport
Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization ! images, vision, graphics and machine learning, . . . • Probability distributions and histograms Optimal transport mean L2 mean • Optimal transport for Correspondence Problems auphine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets nce Problems G. Kim search Suvrit Sra MIT Targets
⌃N def. = p 2 R+ N ; P i pi = 1 . The entropy of T 2 RN⇥N + is deﬁned as H ( T ) def. = PN i,j =1 Ti,j (log( Ti,j ) 1) . The set of couplings between histograms p 2 ⌃N1 and q 2 ⌃N2 is Cp,q def. = T 2 ( R +) N1 ⇥N2 ; T1N2 = p, T>1N1 = q . Here, 1N def. = (1 , . . . , 1) > 2 RN . For any tensor L = ( Li,j,k,` )i,j,k,` and matrix ( Ti,j )i,j, we deﬁne the tensor- matrix multiplication as L ⌦ T def. = ⇣ X k,` Li,j,k,`Tk,` ⌘ i,j . (1) 2. Gromov-Wasserstein Discrepancy 2.1. Entropic Optimal Transport Optimal transport distances are useful to compare two his- tograms ( p, q ) 2 ⌃N1 ⇥ ⌃N2 deﬁned on the same metric 1 https://github.com/gpeyre/ In our setting, since we learning problems, we d distance matrices, i.e., th does not necessarily sati We deﬁne the Gromov- two measured similarity and ( ¯ C, q ) 2 RN2 ⇥N2 ⇥ GW( C, ¯ C, p where EC, ¯ C( T ) The matrix T is a cou which the similarity ma L is some loss function the similarity matrices. the quadratic loss L ( a, the Kullback-Leibler di a log( a/b ) a + b (wh inition (4) of GW ext by (M´ emoli, 2011), sin Entropy: Entropy and Sinkhorn Regularization impact on solution: Def. Regularized OT: [Cuturi NIPS’13] c " T" "
⌃N def. = p 2 R+ N ; P i pi = 1 . The entropy of T 2 RN⇥N + is deﬁned as H ( T ) def. = PN i,j =1 Ti,j (log( Ti,j ) 1) . The set of couplings between histograms p 2 ⌃N1 and q 2 ⌃N2 is Cp,q def. = T 2 ( R +) N1 ⇥N2 ; T1N2 = p, T>1N1 = q . Here, 1N def. = (1 , . . . , 1) > 2 RN . For any tensor L = ( Li,j,k,` )i,j,k,` and matrix ( Ti,j )i,j, we deﬁne the tensor- matrix multiplication as L ⌦ T def. = ⇣ X k,` Li,j,k,`Tk,` ⌘ i,j . (1) 2. Gromov-Wasserstein Discrepancy 2.1. Entropic Optimal Transport Optimal transport distances are useful to compare two his- tograms ( p, q ) 2 ⌃N1 ⇥ ⌃N2 deﬁned on the same metric 1 https://github.com/gpeyre/ In our setting, since we learning problems, we d distance matrices, i.e., th does not necessarily sati We deﬁne the Gromov- two measured similarity and ( ¯ C, q ) 2 RN2 ⇥N2 ⇥ GW( C, ¯ C, p where EC, ¯ C( T ) The matrix T is a cou which the similarity ma L is some loss function the similarity matrices. the quadratic loss L ( a, the Kullback-Leibler di a log( a/b ) a + b (wh inition (4) of GW ext by (M´ emoli, 2011), sin Entropy: Entropy and Sinkhorn Regularization impact on solution: Def. Regularized OT: [Cuturi NIPS’13] c " T" " repeat: until convergence. initialization: (a, b) (1N1 , 1N2 ) return T = diag(a)K diag(b) Fixed point algorithm: Only matrix/vector multiplications. ! Parallelizable. ! Streams well on GPU.
Targets Figure 1: Entropic GW can ﬁnd correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain conﬁdence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simpliﬁcations as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis Use T to deﬁne registration between: Colors distribution Shape Shape Shape
Targets Figure 1: Entropic GW can ﬁnd correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain conﬁdence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simpliﬁcations as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis 0 0.02 0.04 0 0.02 0 0.02 Te Hu Fo Ar Figure 1: The database that has been used, divide MDS in 3-D Use T to deﬁne registration between: Colors distribution Shape Shape Shape Geodesic distances GW distances MDS Vizualization Shapes (Xs)s 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 2 Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: MDS in 2-D
Distances q K>a . ce ag( a ) K diag( b ) . ing (13). s T > s h2( Cs) Ts pp> ⌘ mula (15), one has to show that UsTs diag(1 /p ) is conditionally s matrices are. This is indeed x such that hx, 1N i = 0 , one sxs, xsi where xs def. = Ts x p , and > s 1Ns i = hx p , pi = 0 , one has Ux, xi > 0 , which proves the ince the cone of inﬁnitely divisi- that the output of our barycenter s inﬁnitely divisible. ﬁnitely divisible kernels is given input data clouds are shown on the left, and an MDS embedding of the barycenter distance matrix is shown on the right. Figure 2. Barycenter example for shape data from (Thakoor et al., 2007). 4. Experiments 4.1. Point Clouds Embedded barycenters. Figure 1 provides an example illustrating the behavior of our GW barycenter approxima- tion. In this experiment, we extract 500 point clouds of handwritten digits from the dataset (LeCun et al., 1998), rotated arbitrarily in the plane. We represent each digit as a symmetric Euclidean distance matrix and optimize for a 500 ⇥ 500 barycenter using Algorithm 1 (uniform weights, " = 1 ⇥ 10 3); notice that most of the input point clouds consist of fewer than 500 points. We then visualize the
Entropy: makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) 0 0.02 0.04 0 0.02 0 0.02 Figure 15: MDS embedding of four classes from 0 0.5 0 0.5 1 1 10 15 20 25 35 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse s 0 is the base shape) as a feature vector for shape the result presented in the work of Rustamov e the circular structure of meshes from a galloping sequence (Figure 16). Unlike Rustamov et al., how does not require ground truth maps between shape Figure 1: The database that has been used, divided i 3 Participants Each participant was asked to submit up to 3 runs of his/her algorith dissimilarity matrices; each run could be for example the result of a d or the use of a diﬀerent similarity metric. We remind that the entry ( represent the distance between models i and j. Figure 1: The database that has been used, divi 3 Participants Each participant was asked to submit up to 3 runs of his/her al dissimilarity matrices; each run could be for example the result o or the use of a diﬀerent similarity metric. We remind that the en represent the distance between models i and j. This track saw 5 groups of participants: 1. Ceyhun Burak Akgül, Francis Schmitt, Bülent Sankur and Y 2. Mohamed Chaouch and Anne Verroust-Blondet with 2 matr 3. Thibault Napoléon, Tomasz Adamek, Francis Schmitt and N 4. Petros Daras and Athanasios Mademlis sent 1 matrix; 5. Tony Tung and Francis Schmitt with 3 matrices. Figure 1: The database that has been use 3 Participants Each participant was asked to submit up to 3 runs of his/ dissimilarity matrices; each run could be for example the re Applications: – unregistered data – quantum chemistry – shapes 2 6 6 6 6 4 3 7 7 7 7 5 [Rupp et al 2012]
Entropy: makes the problem tractable e↵ective! Entropy: surprisingly (GW highly non-convex) 0 0.02 0.04 0 0.02 0 0.02 Figure 15: MDS embedding of four classes from 0 0.5 0 0.5 1 1 10 15 20 25 35 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse s 0 is the base shape) as a feature vector for shape the result presented in the work of Rustamov e the circular structure of meshes from a galloping sequence (Figure 16). Unlike Rustamov et al., how does not require ground truth maps between shape Figure 1: The database that has been used, divided i 3 Participants Each participant was asked to submit up to 3 runs of his/her algorith dissimilarity matrices; each run could be for example the result of a d or the use of a diﬀerent similarity metric. We remind that the entry ( represent the distance between models i and j. Figure 1: The database that has been used, divi 3 Participants Each participant was asked to submit up to 3 runs of his/her al dissimilarity matrices; each run could be for example the result o or the use of a diﬀerent similarity metric. We remind that the en represent the distance between models i and j. This track saw 5 groups of participants: 1. Ceyhun Burak Akgül, Francis Schmitt, Bülent Sankur and Y 2. Mohamed Chaouch and Anne Verroust-Blondet with 2 matr 3. Thibault Napoléon, Tomasz Adamek, Francis Schmitt and N 4. Petros Daras and Athanasios Mademlis sent 1 matrix; 5. Tony Tung and Francis Schmitt with 3 matrices. Figure 1: The database that has been use 3 Participants Each participant was asked to submit up to 3 runs of his/ dissimilarity matrices; each run could be for example the re Applications: – unregistered data – quantum chemistry – shapes Theoretical analysis of entropic GW. Large scale applications. Future works: 2 6 6 6 6 4 3 7 7 7 7 5 [Rupp et al 2012]