Numerical Optimal Transport

Slide 1

Slide 1 text

Numerical Optimal Transport Gabriel Peyré É C O L E N O R M A L E S U P É R I E U R E Outline • What is the Lasso • Lasso with an orthogonal design • From projected gradient to proximal gradien • Optimality conditions and subgradients (LAR • Coordinate descent algorithm … with some demos www.numerical-tours.com http://optimaltransport.github.io

Slide 44

Slide 44 text

Linear programming: µ = PN1 i=1 pi xi , ⌫ = PN2 j=1 pj yi ⇠ O(N3) Hungarian/Auction: µ = 1 N PN i=1 xi , ⌫ = 1 N PN j=1 yj Monge-Amp` ere/Benamou-Brenier, d = || · ||2 2 . also reveal that the network simplex behaves in O(n 2) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 101 102 103 104 10−4 10−2 100 102 Problem size : number of bins per histogram Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 6: Log-log plot of the running times of different solvers. Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1/4, ↵ = 1/2, and ↵ = 3/ positive. A negative side effect of this choice is that interpolatin between BRDFs of equal energy conserves their log energy (§ 3. instead of their energy. Because we apply a concave remappin the interpolated value is guaranteed to be always lower, which e sures that our result does not break the energy preservation rul Algorithms µ ⌫

Slide 45

Slide 45 text

Linear programming: µ = PN1 i=1 pi xi , ⌫ = PN2 j=1 pj yi ⇠ O(N3) Hungarian/Auction: µ = 1 N PN i=1 xi , ⌫ = 1 N PN j=1 yj Monge-Amp` ere/Benamou-Brenier, d = || · ||2 2 . also reveal that the network simplex behaves in O(n 2) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 101 102 103 104 10−4 10−2 100 102 Problem size : number of bins per histogram Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 6: Log-log plot of the running times of different solvers. Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1/4, ↵ = 1/2, and ↵ = 3/ positive. A negative side effect of this choice is that interpolatin between BRDFs of equal energy conserves their log energy (§ 3. instead of their energy. Because we apply a concave remappin the interpolated value is guaranteed to be always lower, which e sures that our result does not break the energy preservation rul Algorithms µ ⌫ Semi-discrete: Laguerre cells, d = || · ||2 2 . [Levy,’15]

Slide 46

Slide 46 text

Linear programming: µ = PN1 i=1 pi xi , ⌫ = PN2 j=1 pj yi ⇠ O(N3) Hungarian/Auction: µ = 1 N PN i=1 xi , ⌫ = 1 N PN j=1 yj Monge-Amp` ere/Benamou-Brenier, d = || · ||2 2 . also reveal that the network simplex behaves in O(n 2) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 101 102 103 104 10−4 10−2 100 102 Problem size : number of bins per histogram Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 6: Log-log plot of the running times of different solvers. Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1/4, ↵ = 1/2, and ↵ = 3/ positive. A negative side effect of this choice is that interpolatin between BRDFs of equal energy conserves their log energy (§ 3. instead of their energy. Because we apply a concave remappin the interpolated value is guaranteed to be always lower, which e sures that our result does not break the energy preservation rul Algorithms µ ⌫ Semi-discrete: Laguerre cells, d = || · ||2 2 . [Levy,’15]

Slide 47

Slide 47 text

Need for fast approximate algorithms for generic c. Linear programming: µ = PN1 i=1 pi xi , ⌫ = PN2 j=1 pj yi ⇠ O(N3) Hungarian/Auction: µ = 1 N PN i=1 xi , ⌫ = 1 N PN j=1 yj Monge-Amp` ere/Benamou-Brenier, d = || · ||2 2 . also reveal that the network simplex behaves in O(n 2) in our con- text, which is a major gain at the scale at which we typical work, i.e. thousands of particles. This finding is also useful for applications that use EMD, where using the network simplex instead of the transport simplex can bring a significant performance increase. Our experiments also show that fixed-point precision further speeds up the computation. We observed that the value of the final transport cost is less accurate because of the limited precision, but that the particle pairing that produces the actual interpolation scheme remains unchanged. We used the fixed point method to generate the results presented in this paper. The results of the performance study are also of broader interest, as current EMD image retrieval or color transfer techniques rely on slower solvers [Rubner et al. 2000; Kanters et al. 2003; Morovic and Sun 2003]. 101 102 103 104 10−4 10−2 100 102 Problem size : number of bins per histogram Time in seconds Network Simplex (fixed point) Network Simplex (double prec.) Transport Simplex y = α x2 y = β x3 Figure 6: Log-log plot of the running times of different solvers. Figure 7: Synthetic 2D examples on a Euclidean domain. Th left and right columns show the input distributions, while the cent columns show interpolations for ↵ = 1/4, ↵ = 1/2, and ↵ = 3/ positive. A negative side effect of this choice is that interpolatin between BRDFs of equal energy conserves their log energy (§ 3. instead of their energy. Because we apply a concave remappin the interpolated value is guaranteed to be always lower, which e sures that our result does not break the energy preservation rul Algorithms µ ⌫ Semi-discrete: Laguerre cells, d = || · ||2 2 . [Levy,’15]

Slide 48

Slide 48 text

C-Transform 92 Semi-discrete Optimal Transport 0 0.5 1 -0.2 0 0.2 0.4 0.6 0.5 1 1.5 2 p = 1/2 p = 1 p = 3/2 p = 2 Figure 5.1: Top: examples of semi-discrete ¯ c-transforms g ¯ c in 1-D, for ground d the useful indicator function notation (4.42). alternate minimization on either f or g leads to the im- n of c-transform: ’ y œ Y, fc(y) def. = inf xœX c(x, y) ≠ f(x), (5.1) ’ x œ X, g¯ c(x) def. = inf yœY c(x, y) ≠ g(y), (5.2) oted ¯ c(y, x) def. = c(x, y). Indeed, one can check that œ argmax g E(f, g) and g¯ c œ argmax f E(f, g). (5.3) ese partial minimizations define maximizers on the sup- tively – and —, while the definitions (5.1) actually define he whole spaces X and Y. This is thus a way to extend in ay solutions of (2.22) on the whole spaces. When X = Rd Îx ≠ yÎp, then the c-transform (5.1) fc is the so-called n between ≠f and Î·Îp. The definition of fc is also often a “Hopf-Lax formula”. (f, g) œ C(X) ◊ C(Y) ‘æ (g¯ c, fc) œ C(X) ◊ C(Y) replaces s by “better” ones (improving the dual objective E). Func- be written in the form fc and g¯ c are called c-concave and ctions. In the special case c(x, y) = Èx, yÍ in X = Y = Rd, n coincides with the usual notion of concave functions. turally Proposition 3.1 to a continuous case, one has the operations are replaced by a “soft-min”. Using (5.3), one can reformulate (2.22) as an unconstrained program over a single potential Lc (–, —) = max fœC ( X ) ⁄ X f(x)d–(x) + ⁄ Y fc(y)d—(y), = max gœC ( Y ) ⁄ X g¯ c(x)d–(x) + ⁄ Y g(y)d—(y). Since one can iterate the map (f, g) ‘æ (g¯ c, fc), it is possible to these optimization problems the constraint that f is ¯ c-concave is c-concave, which is important to ensure enough regularity on potentials and show for instance existence of solutions to (2.22) 5.2 Semi-discrete Formulation A case of particular interest is when — = q j bj ” yj is discrete (of the same construction applies if – is discrete by exchanging the (–, —)). One can adapt the definition of the ¯ c transform (5.1) setting by restricting the minimization to the support (y j ) j of — ’ g œ Rm, ’ x œ X, g ¯ c(x) def. = min jœJmK c(x, y j ) ≠ gj . This transform maps a vector g to a continuous function g ¯ c œ Note that this definition coincides with (5.1) when imposing th space X is equal to the support of —. Figure 5.1 shows some ex of such discrete ¯ c-transforms in 1-D and 2-D. Using this discrete ¯ c-transform, in this semi-discrete case, ( equivalent to the following finite dimensional optimization

Slide 154

Slide 154 text

phine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis Use T to define registration between: Colors distribution Shape Shape Shape

Slide 155

Slide 155 text

phine Vladimir G. Kim Adobe Research Suvrit Sra MIT Source Targets Figure 1: Entropic GW can find correspondences between a source surface (left) and a surface with similar structure, a surface with shared semantic structure, a noisy 3D point cloud, an icon, and a hand drawing. Each fuzzy map was computed using the same code. In this paper, we propose a new correspondence algorithm that minimizes distortion of long- and short-range distances alike. We study an entropically-regularized version of the Gromov-Wasserstein (GW) mapping objective function from [M´ emoli 2011] measuring the distortion of geodesic distances. The optimizer is a probabilistic matching expressed as a “fuzzy” correspondence matrix in the style of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of the correspondence via the weight of the entropic regularizer. 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 1 PCA 2 Figure 16: Recovery of galloping horse sequence. 0 is the base shape) as a feature vector for shape i. We reproduce the result presented in the work of Rustamov et al., recovering the circular structure of meshes from a galloping horse animation sequence (Figure 16). Unlike Rustamov et al., however, our method does not require ground truth maps between shapes as input. 5.2 Supervised Matching An important feature of a matching tool is the ability to incorporate user input, e.g. ground truth matches of points or regions. In the GWα framework, one way to enforce these constraints is to provide a stencil S specifying a sparsity pattern for the map Γ. Incorporating Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: min Γ∈M ijkℓ (D0ij −Dkℓ )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ . (8) For instance, (W0, W) might contain confidence values expressing the quality of the entries of (D0, D). Or, W0, W could take values in {ε, 1} reducing the weight of distances that are unknown or do not need to be preserved by Γ. Following the same simplifications as §3.1, we can optimize this objective by minimizing ⟨Γ, ΛW (Γ)⟩, where ΛW (Γ) := 1 2 [D∧2 0 ⊗ W0][[µ0 ]]Γ[[µ]]W − [D0 ⊗ W0][[µ0 ]]Γ[[µ]][D ⊗ W] + 1 2 W0[[µ0 ]]Γ[[µ]][D∧2 ⊗ W] Applications of GW: Shapes Analysis 0 0.02 0.04 0 0.02 0 0.02 Te Hu Fo Ar Figure 1: The database that has been used, divid MDS in 3-D Use T to define registration between: Colors distribution Shape Shape Shape Geodesic distances GW distances MDS Vizualization Shapes (Xs)s 0 0.02 0.04 0 0.02 0 0.02 Teddies Humans Four-legged Armadillo Figure 15: MDS embedding of four classes from SHREC dataset. 0 0.5 1 1 5 10 15 20 25 30 35 40 45 PCA 2 Figure 18: Mapping a set of 185 images onto a two shapes while preserving color similarity. (Images from Flickr public domain collection.) Rn0×n0 + and D ∈ Rn×n + we are given symmetric weight matrices W0 ∈ Rn0×n0 + and W ∈ Rn×n + . We could solve a weighted version of the GWα matching problem (3) that prioritizes maps preserving distances corresponding to large W values: MDS in 2-D

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text