$30 off During Our Annual Pro Sale. View Details »

Numerical Optimal Transport and Applications

Numerical Optimal Transport and Applications

Talk at Orsay Numerical Analysis Seminar, Feb. 4th 2016.
Update for talk a MFO, Feb. 1st 2017.

Gabriel Peyré

February 04, 2016
Tweet

More Decks by Gabriel Peyré

Other Decks in Research

Transcript

  1. Computational
    Optimal Transport
    and Applications
    Gabriel Peyré
    www.numerical-tours.com
    Joint works with:
    Jean-David Benamou, Guillaume Carlier,
    Lenaïc Chizat, Marco Cuturi
    Luca Nenna, Justin Solomon,
    François-Xavier Vialard
    É C O L E N O R M A L E
    S U P É R I E U R E

    View Slide

  2. Comparing Measures and Spaces
    Source image (X)
    Style image (Y)
    Sliced Wasserstein projection of X to style
    image color statistics Y
    Source image after color transfer
    J. Rabin Wasserstein Regularization
    !
    images, vision, graphics and machine learning, . . .

    Probability distributions and histograms

    View Slide

  3. Comparing Measures and Spaces
    Source image (X)
    Style image (Y)
    Sliced Wasserstein projection of X to style
    image color statistics Y
    Source image after color transfer
    J. Rabin Wasserstein Regularization
    !
    images, vision, graphics and machine learning, . . .

    Probability distributions and histograms
    Optimal transport mean
    L2 mean
    • Optimal transport

    View Slide

  4. Comparing Measures and Spaces
    Source image (X)
    Style image (Y)
    Sliced Wasserstein projection of X to style
    image color statistics Y
    Source image after color transfer
    J. Rabin Wasserstein Regularization
    !
    images, vision, graphics and machine learning, . . .

    Probability distributions and histograms
    Optimal transport mean
    L2 mean
    • Optimal transport
    for Correspondence Problems
    auphine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    Source Targets
    nce Problems
    G. Kim
    esearch
    Suvrit Sra
    MIT
    Targets

    View Slide

  5. Overview
    • Transportation-like Problems
    • Regularized Transport
    • Barycenters
    • Gromov-Wasserstein

    View Slide

  6. Radon Measures and Couplings
    On a metric space (X, d).
    Probability measures.
    Positive Radon measures
    M+(X)
    :
    X = R
    d
    µ
    (
    x
    ) =
    f
    (
    x
    )d
    x
    µ =
    P
    i
    µ
    i xi
    µ(X) = 1
    R
    f = 1
    P
    i
    µi = 1

    View Slide

  7. Radon Measures and Couplings
    On a metric space (X, d).
    Probability measures.
    Couplings:
    Positive Radon measures
    M+(X)
    :
    P1]⇡(S) def.
    = ⇡(S, X) P2]⇡(S) def.
    = ⇡(X, S)
    Marginals:
    X = R
    d
    µ
    (
    x
    ) =
    f
    (
    x
    )d
    x
    µ =
    P
    i
    µ
    i xi

    ⇧(µ, ⌫) def.
    = {⇡ 2 M+(X ⇥ X) ; P1]⇡ = µ, P2]⇡ = ⌫}
    µ(X) = 1
    R
    f = 1
    P
    i
    µi = 1

    View Slide

  8. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )

    View Slide

  9. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj

    View Slide

  10. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    ⇠ O(N3)
    Hungarian/Auction:
    µ = 1
    N
    P
    N
    i
    =1 xi
    , ⌫ = 1
    N
    P
    N
    j
    =1 yj
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj

    View Slide

  11. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    ⇠ O(N3)
    Hungarian/Auction:
    µ = 1
    N
    P
    N
    i
    =1 xi
    , ⌫ = 1
    N
    P
    N
    j
    =1 yj
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj
    µ

    View Slide

  12. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    Monge-Amp`
    ere/Benamou-Brenier,
    d
    =
    || · ||2
    2.
    also reveal that the network simplex behaves in O
    (
    n2
    )
    in our con-
    text, which is a major gain at the scale at which we typical work,
    i.e. thousands of particles. This finding is also useful for applica-
    tions that use EMD, where using the network simplex instead of
    the transport simplex can bring a significant performance increase.
    Our experiments also show that fixed-point precision further speeds
    up the computation. We observed that the value of the final trans-
    port cost is less accurate because of the limited precision, but that
    the particle pairing that produces the actual interpolation scheme
    remains unchanged. We used the fixed point method to generate
    the results presented in this paper. The results of the performance
    study are also of broader interest, as current EMD image retrieval
    or color transfer techniques rely on slower solvers [Rubner et al.
    2000; Kanters et al. 2003; Morovic and Sun 2003].
    10−4
    10−2
    100
    102
    Time in seconds
    Network Simplex (fixed point)
    Network Simplex (double prec.)
    Transport Simplex
    y = α x2
    y = β x3
    Figure 7:
    Synthetic 2D examples on a Euclidean domain. Th
    left and right columns show the input distributions, while the cent
    columns show interpolations for ↵
    = 1
    /
    4
    , ↵
    = 1
    /
    2
    , and ↵
    = 3
    /
    positive. A negative side effect of this choice is that interpolatin
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    ⇠ O(N3)
    Hungarian/Auction:
    µ = 1
    N
    P
    N
    i
    =1 xi
    , ⌫ = 1
    N
    P
    N
    j
    =1 yj
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj
    µ

    View Slide

  13. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    Monge-Amp`
    ere/Benamou-Brenier,
    d
    =
    || · ||2
    2.
    Semi-discrete: Laguerre cells, d = || · ||2
    2
    .
    also reveal that the network simplex behaves in O
    (
    n2
    )
    in our con-
    text, which is a major gain at the scale at which we typical work,
    i.e. thousands of particles. This finding is also useful for applica-
    tions that use EMD, where using the network simplex instead of
    the transport simplex can bring a significant performance increase.
    Our experiments also show that fixed-point precision further speeds
    up the computation. We observed that the value of the final trans-
    port cost is less accurate because of the limited precision, but that
    the particle pairing that produces the actual interpolation scheme
    remains unchanged. We used the fixed point method to generate
    the results presented in this paper. The results of the performance
    study are also of broader interest, as current EMD image retrieval
    or color transfer techniques rely on slower solvers [Rubner et al.
    2000; Kanters et al. 2003; Morovic and Sun 2003].
    10−4
    10−2
    100
    102
    Time in seconds
    Network Simplex (fixed point)
    Network Simplex (double prec.)
    Transport Simplex
    y = α x2
    y = β x3
    Figure 7:
    Synthetic 2D examples on a Euclidean domain. Th
    left and right columns show the input distributions, while the cent
    columns show interpolations for ↵
    = 1
    /
    4
    , ↵
    = 1
    /
    2
    , and ↵
    = 3
    /
    positive. A negative side effect of this choice is that interpolatin
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    ⇠ O(N3)
    Hungarian/Auction:
    µ = 1
    N
    P
    N
    i
    =1 xi
    , ⌫ = 1
    N
    P
    N
    j
    =1 yj
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj
    µ

    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRAN
    Figure 9. More examples of semi-discrete optimal tra
    Note how the solids deform and merge to form the sphere
    first row, and how the branches of the star split and merge
    second row.
    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21
    Figure 9. More examples of semi-discrete optimal transport.
    Note how the solids deform and merge to form the sphere on the
    first row, and how the branches of the star split and merge on the
    second row.
    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21
    Figure 9. More examples of semi-discrete optimal transport.
    Note how the solids deform and merge to form the sphere on the
    first row, and how the branches of the star split and merge on the
    second row.
    [Levy,’15]
    [M´
    erigot 11]

    View Slide

  14. Optimal Transport
    Optimal transport: [Kantorovitch 1942]
    Need for fast approximate algorithms for generic
    c
    .
    Monge-Amp`
    ere/Benamou-Brenier,
    d
    =
    || · ||2
    2.
    Semi-discrete: Laguerre cells, d = || · ||2
    2
    .
    also reveal that the network simplex behaves in O
    (
    n2
    )
    in our con-
    text, which is a major gain at the scale at which we typical work,
    i.e. thousands of particles. This finding is also useful for applica-
    tions that use EMD, where using the network simplex instead of
    the transport simplex can bring a significant performance increase.
    Our experiments also show that fixed-point precision further speeds
    up the computation. We observed that the value of the final trans-
    port cost is less accurate because of the limited precision, but that
    the particle pairing that produces the actual interpolation scheme
    remains unchanged. We used the fixed point method to generate
    the results presented in this paper. The results of the performance
    study are also of broader interest, as current EMD image retrieval
    or color transfer techniques rely on slower solvers [Rubner et al.
    2000; Kanters et al. 2003; Morovic and Sun 2003].
    10−4
    10−2
    100
    102
    Time in seconds
    Network Simplex (fixed point)
    Network Simplex (double prec.)
    Transport Simplex
    y = α x2
    y = β x3
    Figure 7:
    Synthetic 2D examples on a Euclidean domain. Th
    left and right columns show the input distributions, while the cent
    columns show interpolations for ↵
    = 1
    /
    4
    , ↵
    = 1
    /
    2
    , and ↵
    = 3
    /
    positive. A negative side effect of this choice is that interpolatin
    W
    p
    p
    (
    µ, ⌫
    ) def.
    = min


    h
    d
    p
    , ⇡
    i =
    Z
    X⇥X
    d
    (
    x, y
    )pd

    (
    x, y
    ) ;

    2 ⇧(
    µ, ⌫
    )
    ⇠ O(N3)
    Hungarian/Auction:
    µ = 1
    N
    P
    N
    i
    =1 xi
    , ⌫ = 1
    N
    P
    N
    j
    =1 yj
    Linear programming:
    µ =
    P
    i
    µ
    i xi
    , ⌫ =
    P
    j

    j yj
    µ

    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRAN
    Figure 9. More examples of semi-discrete optimal tra
    Note how the solids deform and merge to form the sphere
    first row, and how the branches of the star split and merge
    second row.
    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21
    Figure 9. More examples of semi-discrete optimal transport.
    Note how the solids deform and merge to form the sphere on the
    first row, and how the branches of the star split and merge on the
    second row.
    A NUMERICAL ALGORITHM FOR
    L2
    SEMI-DISCRETE OPTIMAL TRANSPORT IN 3D 21
    Figure 9. More examples of semi-discrete optimal transport.
    Note how the solids deform and merge to form the sphere on the
    first row, and how the branches of the star split and merge on the
    second row.
    [Levy,’15]
    [M´
    erigot 11]

    View Slide

  15. Unbalanced Transport
    (
    ⇠, µ
    )
    2 M+(
    X
    )
    2
    , KL(
    ⇠|µ
    )
    def.
    =
    R
    X log

    d⇠


    d
    µ
    +
    R
    X (d
    µ
    d

    )
    WFc(µ, ⌫) def.
    = min

    hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫)
    [Liereo, Mielke, Savar´
    e 2015]

    View Slide

  16. Unbalanced Transport
    [Liereo, Mielke, Savar´
    e 2015]
    [Chizat, Schmitzer, Peyr´
    e, Vialard 2015]
    Proposition:
    then
    WF1/2
    c is a distance on
    M+(
    X
    ).
    (
    ⇠, µ
    )
    2 M+(
    X
    )
    2
    , KL(
    ⇠|µ
    )
    def.
    =
    R
    X log

    d⇠


    d
    µ
    +
    R
    X (d
    µ
    d

    )
    If c(x, y) = log(cos(min(d(x, y),

    2 ))
    WFc(µ, ⌫) def.
    = min

    hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫)
    [Liereo, Mielke, Savar´
    e 2015]

    View Slide

  17. Unbalanced Transport
    !
    “Dynamic” Benamou-Brenier formulation.
    [Liereo, Mielke, Savar´
    e 2015][Kondratyev, Monsaingeon, Vorotnikov, 2015]
    [Liereo, Mielke, Savar´
    e 2015]
    [Chizat, Schmitzer, Peyr´
    e, Vialard 2015]
    Proposition:
    then
    WF1/2
    c is a distance on
    M+(
    X
    ).
    [Chizat, Schmitzer, P, Vialard 2015]
    (
    ⇠, µ
    )
    2 M+(
    X
    )
    2
    , KL(
    ⇠|µ
    )
    def.
    =
    R
    X log

    d⇠


    d
    µ
    +
    R
    X (d
    µ
    d

    )
    If c(x, y) = log(cos(min(d(x, y),

    2 ))
    WFc(µ, ⌫) def.
    = min

    hc, ⇡i + KL(P1]⇡|µ) + KL(P2]⇡|⌫)
    [Liereo, Mielke, Savar´
    e 2015]
    Balanced OT Unbalanced OT

    View Slide

  18. Wasserstein Gradient Flows
    Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]
    µt+1 = Prox
    W
    ⌧f (
    µt)
    def.
    = argmin
    µ2M+(X)
    W2
    2 (
    µt, µ
    ) +
    ⌧f
    (
    µ
    )

    View Slide

  19. Wasserstein Gradient Flows
    Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]
    Formal limit
    ⌧ !
    0:
    @tµ = div (µr(f0(µ)))
    µt+1 = Prox
    W
    ⌧f (
    µt)
    def.
    = argmin
    µ2M+(X)
    W2
    2 (
    µt, µ
    ) +
    ⌧f
    (
    µ
    )

    View Slide

  20. Wasserstein Gradient Flows
    Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]
    Formal limit
    ⌧ !
    0:
    @tµ = div (µr(f0(µ)))
    (heat di↵usion)
    f
    (
    µ
    ) =
    R
    log(
    d
    µ
    d
    x )d
    µ @tµ = µ
    µt+1 = Prox
    W
    ⌧f (
    µt)
    def.
    = argmin
    µ2M+(X)
    W2
    2 (
    µt, µ
    ) +
    ⌧f
    (
    µ
    )

    View Slide

  21. Wasserstein Gradient Flows
    Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]
    Formal limit
    ⌧ !
    0:
    @tµ = div (µr(f0(µ)))
    (advection)
    f(µ) =
    R
    wdµ @tµ = div(µrw)
    (heat di↵usion)
    f
    (
    µ
    ) =
    R
    log(
    d
    µ
    d
    x )d
    µ @tµ = µ
    Evolution
    µt
    Evolution
    µt
    rw rw
    µt+1 = Prox
    W
    ⌧f (
    µt)
    def.
    = argmin
    µ2M+(X)
    W2
    2 (
    µt, µ
    ) +
    ⌧f
    (
    µ
    )

    View Slide

  22. Wasserstein Gradient Flows
    Implicit Euler step: [Jordan, Kinderlehrer, Otto 1998]
    Formal limit
    ⌧ !
    0:
    @tµ = div (µr(f0(µ)))
    (advection)
    f(µ) =
    R
    wdµ @tµ = div(µrw)
    (heat di↵usion)
    f
    (
    µ
    ) =
    R
    log(
    d
    µ
    d
    x )d
    µ @tµ = µ
    (non-linear di↵usion)
    f(µ) = 1
    m
    1
    R
    (d
    µ
    d
    x
    )m
    1dµ @tµ = µm
    Evolution
    µt
    Evolution
    µt
    rw rw
    µt+1 = Prox
    W
    ⌧f (
    µt)
    def.
    = argmin
    µ2M+(X)
    W2
    2 (
    µt, µ
    ) +
    ⌧f
    (
    µ
    )

    View Slide

  23. Overview
    • Transportation-like Problems
    • Regularized Transport
    • Barycenters
    • Gromov-Wasserstein

    View Slide

  24. Transportation-like Problems
    Unbalanced transport:
    Optimal transport:
    Gradient flows:
    min
    ⇡2M+(X⇥X)
    E(⇡) def.
    = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡)
    f1 = ◆µ f2 = ◆⌫
    f1 = KL(·|µ) f2 = KL(·|⌫)
    f1 = ◆µt
    f2 = f

    View Slide

  25. Transportation-like Problems
    Regularization and positivity barrier.
    Discretization grid (prescribed support).
    Unbalanced transport:
    Optimal transport:
    Gradient flows:
    min
    ⇡2M+(X⇥X)
    E(⇡) def.
    = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡)
    f1 = ◆µ f2 = ◆⌫
    f1 = KL(·|µ) f2 = KL(·|⌫)
    f1 = ◆µt
    f2 = f
    Regularization:
    min

    E(⇡) + "KL(⇡|⇡0)
    "KL(⇡|⇡0)

    View Slide

  26. Transportation-like Problems
    Regularization and positivity barrier.
    Discretization grid (prescribed support).
    ⇡0
    argmin E(⇡)
    ⇡1 = Prox1
    "
    E (
    ⇡0) ⇡2 = Prox1
    "
    E (
    ⇡1)
    Unbalanced transport:
    Optimal transport:
    Gradient flows:
    min
    ⇡2M+(X⇥X)
    E(⇡) def.
    = hc, ⇡i + f1(P1]⇡) + f2(P2]⇡)
    f1 = ◆µ f2 = ◆⌫
    f1 = KL(·|µ) f2 = KL(·|⌫)
    f1 = ◆µt
    f2 = f
    Regularization:
    min

    E(⇡) + "KL(⇡|⇡0)
    Prox
    KL
    E/"(
    ⇡0)
    def.
    = arg
    "KL(⇡|⇡0)
    Implicit KL stepping.

    View Slide

  27. Entropy Regularized Transport
    Schr¨
    odinger’s problem:
    Landmark computational paper: [Cuturi 2013].
    ⇡"
    def.
    = argmin

    {hdp, ⇡i + "KL(⇡|⇡0) ; ⇡ 2 ⇧(µ, ⌫)}
    ⇡" = argmin
    ⇡2⇧(µ,⌫)
    KL(⇡|K)
    K
    (
    x, y
    ) def.
    =
    e
    dp(
    x,y
    )
    " ⇡0(
    x, y
    )

    View Slide

  28. Entropy Regularized Transport
    Schr¨
    odinger’s problem:
    Landmark computational paper: [Cuturi 2013].
    ⇡"
    def.
    = argmin

    {hdp, ⇡i + "KL(⇡|⇡0) ; ⇡ 2 ⇧(µ, ⌫)}
    ⇡" = argmin
    ⇡2⇧(µ,⌫)
    KL(⇡|K)
    K
    (
    x, y
    ) def.
    =
    e
    dp(
    x,y
    )
    " ⇡0(
    x, y
    )
    Proposition:
    [Carlier, Duval, Peyr´
    e, Schmitzer 2015]
    ⇡"
    "!0
    ! argmin
    ⇡2⇧(µ,⌫)
    hdp, ⇡i
    ⇡"
    "!+1
    !
    µ
    (
    x
    )

    (
    y
    )
    "
    ⇡" "

    View Slide

  29. Dykstra-like Iterations
    Primal: min

    hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0)

    View Slide

  30. Dykstra-like Iterations
    Primal: min

    hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0)
    Dual:
    (a, b) def.
    = (e u
    " , e v
    " )
    max
    u,v
    f⇤
    1 (
    u
    )
    f⇤
    2 (
    v
    )
    "he u
    " , Ke v
    " i

    (
    x, y
    ) =
    a
    (
    x
    )
    K
    (
    x, y
    )
    b
    (
    y
    )

    View Slide

  31. Dykstra-like Iterations
    Primal: min

    hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0)
    Block coordinates
    relaxation:
    max
    u
    f⇤
    1 (
    u
    )
    "he u
    " , Ke v
    " i
    max
    v
    f⇤
    2 (
    v
    )
    "he v
    " , K⇤e u
    " i (Iv)
    (Iu)
    Dual:
    (a, b) def.
    = (e u
    " , e v
    " )
    max
    u,v
    f⇤
    1 (
    u
    )
    f⇤
    2 (
    v
    )
    "he u
    " , Ke v
    " i

    (
    x, y
    ) =
    a
    (
    x
    )
    K
    (
    x, y
    )
    b
    (
    y
    )

    View Slide

  32. Dykstra-like Iterations
    Primal: min

    hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0)
    Block coordinates
    relaxation:
    max
    u
    f⇤
    1 (
    u
    )
    "he u
    " , Ke v
    " i
    max
    v
    f⇤
    2 (
    v
    )
    "he v
    " , K⇤e u
    " i (Iv)
    (Iu)
    Proposition:
    Prox
    KL
    f1/"(
    µ
    )
    def.
    = argmin ⌫f1(

    ) +
    "
    KL(
    ⌫|µ
    )
    the solutions of (
    Iu) and (
    Iv) read:
    a
    =
    Prox
    KL
    f1/"(
    Kb
    )
    Kb
    b
    =
    Prox
    KL
    f2/"(
    K⇤a
    )
    K⇤a
    Dual:
    (a, b) def.
    = (e u
    " , e v
    " )
    max
    u,v
    f⇤
    1 (
    u
    )
    f⇤
    2 (
    v
    )
    "he u
    " , Ke v
    " i

    (
    x, y
    ) =
    a
    (
    x
    )
    K
    (
    x, y
    )
    b
    (
    y
    )

    View Slide

  33. Dykstra-like Iterations
    Primal:
    ! Only matrix-vector multiplications. ! Highly parallelizable.
    ! On regular grids: only convolutions! Linear time iterations.
    min

    hdp, ⇡i + f1(P1]⇡) + f2(P2]⇡) + "KL(⇡|⇡0)
    Block coordinates
    relaxation:
    max
    u
    f⇤
    1 (
    u
    )
    "he u
    " , Ke v
    " i
    max
    v
    f⇤
    2 (
    v
    )
    "he v
    " , K⇤e u
    " i (Iv)
    (Iu)
    Proposition:
    Prox
    KL
    f1/"(
    µ
    )
    def.
    = argmin ⌫f1(

    ) +
    "
    KL(
    ⌫|µ
    )
    the solutions of (
    Iu) and (
    Iv) read:
    a
    =
    Prox
    KL
    f1/"(
    Kb
    )
    Kb
    b
    =
    Prox
    KL
    f2/"(
    K⇤a
    )
    K⇤a
    Dual:
    (a, b) def.
    = (e u
    " , e v
    " )
    max
    u,v
    f⇤
    1 (
    u
    )
    f⇤
    2 (
    v
    )
    "he u
    " , Ke v
    " i

    (
    x, y
    ) =
    a
    (
    x
    )
    K
    (
    x, y
    )
    b
    (
    y
    )

    View Slide

  34. Sinkhorn’s Algorithm
    Prox
    KL
    f1/"(˜
    µ
    ) =
    µ
    Prox
    KL
    f2/"(˜

    ) =

    f1 = ◆µ
    f2 = ◆⌫
    Optimal transport problem:

    View Slide

  35. Sinkhorn’s Algorithm
    Prox
    KL
    f1/"(˜
    µ
    ) =
    µ
    Prox
    KL
    f2/"(˜

    ) =

    `
    ⇡(`)
    "
    [Sinkhorn 1967]
    [Deming,Stephan 1940]
    Sinkhorn/IPFP algorithm:
    a(`+1) def.
    =
    µ
    Kb(`)
    and b(`+1) def.
    =

    K⇤a(`+1)
    f1 = ◆µ
    f2 = ◆⌫
    Optimal transport problem:

    View Slide

  36. Sinkhorn’s Algorithm
    Prox
    KL
    f1/"(˜
    µ
    ) =
    µ
    Prox
    KL
    f2/"(˜

    ) =

    ⇡"
    "
    `
    ⇡(`)
    "
    [Sinkhorn 1967]
    [Deming,Stephan 1940]
    Sinkhorn/IPFP algorithm:
    a(`+1) def.
    =
    µ
    Kb(`)
    and b(`+1) def.
    =

    K⇤a(`+1)
    f1 = ◆µ
    f2 = ◆⌫
    Optimal transport problem:
    Proposition:
    [Franklin,Lorenz 1989]
    ⇡(`) def.
    = diag(a(`))K diag(b(`))
    ||
    log(
    ⇡(`)
    ) log(
    ⇡?
    )
    ||1 =
    O
    (1 )
    `, ⇠  1/"
    c
    Local rate: [Knight 2008]
    1000 2000 3000 4000 5000
    -2
    -1.5
    -1
    -0.5
    0
    `
    log(
    ||
    log(
    ⇡(`)
    ) log(
    ⇡?
    )
    ||1)

    View Slide

  37. Gradient Flows: Crowd Motion
    µt+1
    def.
    = argminµ
    W↵

    (µt, µ) + ⌧f(µ)
    Congestion-inducing function:
    f(µ) = ◆[0,]
    (µ) + hw, µi
    [Maury, Roudne↵-Chupin, Santambrogio 2010]

    View Slide

  38. Gradient Flows: Crowd Motion
    µt+1
    def.
    = argminµ
    W↵

    (µt, µ) + ⌧f(µ)
    Proposition: Prox1
    "
    f (
    µ
    ) = min(
    e "wµ, 
    )
    Congestion-inducing function:
    f(µ) = ◆[0,]
    (µ) + hw, µi
    [Maury, Roudne↵-Chupin, Santambrogio 2010]

    View Slide

  39. Gradient Flows: Crowd Motion
    µt+1
    def.
    = argminµ
    W↵

    (µt, µ) + ⌧f(µ)
    Proposition: Prox1
    "
    f (
    µ
    ) = min(
    e "wµ, 
    )
     = ||µt=0
    ||1  = 2||µt=0
    ||1
     = 4||µt=0
    ||1
    Congestion-inducing function:
    f(µ) = ◆[0,]
    (µ) + hw, µi
    [Maury, Roudne↵-Chupin, Santambrogio 2010]
    rw

    View Slide

  40. Crowd Motion on a Surface
    Potential cos(
    w
    )
     = ||µt=0
    ||1  = 6||µt=0
    ||1
    X = triangulated mesh.

    View Slide

  41. Crowd Motion on a Surface
    Potential cos(
    w
    )
     = ||µt=0
    ||1  = 6||µt=0
    ||1
    X = triangulated mesh.

    View Slide

  42. Gradient Flows: Crowd Motion with Obstacles
    Potential cos(
    w
    )
     = ||µt=0
    ||1  = 2||µt=0
    ||1  = 4||µt=0
    ||1  = 6||µt=0
    ||1
    X
    = sub-domain of
    R2
    .

    View Slide

  43. Gradient Flows: Crowd Motion with Obstacles
    Potential cos(
    w
    )
     = ||µt=0
    ||1  = 2||µt=0
    ||1  = 4||µt=0
    ||1  = 6||µt=0
    ||1
    X
    = sub-domain of
    R2
    .

    View Slide

  44. Multiple-Density Gradient Flows
    (µ1,t+1, µ2,t+1) def.
    = argmin
    (µ1,µ2)
    W↵

    (µ1,t, µ1) + W↵

    (µ2,t, µ2) + ⌧f(µ1, µ2)

    View Slide

  45. Multiple-Density Gradient Flows
    Wasserstein attraction:
    rw
    (µ1,t+1, µ2,t+1) def.
    = argmin
    (µ1,µ2)
    W↵

    (µ1,t, µ1) + W↵

    (µ2,t, µ2) + ⌧f(µ1, µ2)
    f(µ1, µ2) = W↵

    (µ1, µ2) + h1(µ1) + h2(µ2)
    hi(µ) = hw, µi

    View Slide

  46. Multiple-Density Gradient Flows
    Wasserstein attraction:
    rw
    (µ1,t+1, µ2,t+1) def.
    = argmin
    (µ1,µ2)
    W↵

    (µ1,t, µ1) + W↵

    (µ2,t, µ2) + ⌧f(µ1, µ2)
    f(µ1, µ2) = W↵

    (µ1, µ2) + h1(µ1) + h2(µ2)
    hi(µ) = ◆[0,]
    (µ).
    hi(µ) = hw, µi

    View Slide

  47. Overview
    • Transportation-like Problems
    • Regularized Transport
    • Barycenters
    • Gromov-Wasserstein

    View Slide

  48. Wasserstein Barycenters
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  49. Wasserstein Barycenters
    µ
    µ1
    µ3
    W2
    (µ1
    , µ )
    W
    2 (µ
    2 ,µ )
    µ2
    W2
    (µ3

    )
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    If µ
    k
    =
    xk
    then µ? = P
    k kxk
    Generalizes Euclidean barycenter:
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  50. Wasserstein Barycenters
    µ
    µ1
    µ3
    W2
    (µ1
    , µ )
    W
    2 (µ
    2 ,µ )
    µ2
    W2
    (µ3

    )
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    If µ
    k
    =
    xk
    then µ? = P
    k kxk
    Generalizes Euclidean barycenter:
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  51. Wasserstein Barycenters
    µ
    µ1
    µ3
    W2
    (µ1
    , µ )
    W
    2 (µ
    2 ,µ )
    µ2
    W2
    (µ3

    )
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    If µ
    k
    =
    xk
    then µ? = P
    k kxk
    Generalizes Euclidean barycenter:
    µ2
    µ1
    Mc Cann’s displacement interpolation.
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  52. Wasserstein Barycenters
    µ
    µ1
    µ3
    W2
    (µ1
    , µ )
    W
    2 (µ
    2 ,µ )
    µ2
    W2
    (µ3

    )
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    If µ
    k
    =
    xk
    then µ? = P
    k kxk
    Generalizes Euclidean barycenter:
    µ2
    µ1
    µ3
    µ2
    µ1
    Mc Cann’s displacement interpolation.
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  53. Wasserstein Barycenters
    µ
    µ1
    µ3
    W2
    (µ1
    , µ )
    W
    2 (µ
    2 ,µ )
    µ2
    W2
    (µ3

    )
    Barycenters of measures (
    µk)k:
    P
    k k = 1
    If µ
    k
    =
    xk
    then µ? = P
    k kxk
    Generalizes Euclidean barycenter:
    µ2
    µ1
    µ3
    µ exists and is unique.
    Theorem:
    if
    µ1 does not vanish on small sets,
    [Agueh, Carlier, 2010]
    (for c(x, y) =
    ||
    x y
    ||2
    )
    µ2
    µ1
    Mc Cann’s displacement interpolation.
    µ? 2 argmin
    µ
    P
    k kW2
    2
    (µk, µ)

    View Slide

  54. Regularized Barycenters
    min
    (⇡k)k,µ
    (
    X
    k
    k (hc, ⇡k
    i + "KL(⇡k
    |⇡0,k)) ; 8k, ⇡k
    2 ⇧(µk, µ)
    )

    View Slide

  55. Regularized Barycenters
    !
    Need to fix a discretization grid for
    µ
    , i.e. choose (
    ⇡0,k)k
    min
    (⇡k)k,µ
    (
    X
    k
    k (hc, ⇡k
    i + "KL(⇡k
    |⇡0,k)) ; 8k, ⇡k
    2 ⇧(µk, µ)
    )

    View Slide

  56. Regularized Barycenters
    !
    Need to fix a discretization grid for
    µ
    , i.e. choose (
    ⇡0,k)k
    !
    Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyr´
    e, 2015].
    min
    (⇡k)k,µ
    (
    X
    k
    k (hc, ⇡k
    i + "KL(⇡k
    |⇡0,k)) ; 8k, ⇡k
    2 ⇧(µk, µ)
    )

    View Slide

  57. Regularized Barycenters
    !
    Need to fix a discretization grid for
    µ
    , i.e. choose (
    ⇡0,k)k
    !
    Sinkhorn-like algorithm [Benamou, Carlier, Cuturi, Nenna, Peyr´
    e, 2015].
    [Solomon et al, SIGGRAPH 2015]
    min
    (⇡k)k,µ
    (
    X
    k
    k (hc, ⇡k
    i + "KL(⇡k
    |⇡0,k)) ; 8k, ⇡k
    2 ⇧(µk, µ)
    )

    View Slide

  58. Barycenter on a Surface
    1
    µ1
    µ2
    0 1

    View Slide

  59. Barycenter on a Surface
    1
    µ1
    µ2
    µ3
    µ4
    µ5 µ6
    µ1
    µ2
    0 1

    View Slide

  60. Barycenter on a Surface
    1
    = (1, . . . , 1)/6
    µ1
    µ2
    µ3
    µ4
    µ5 µ6
    µ1
    µ2
    0 1

    View Slide

  61. Color Transfer
    µ

    Input images:
    (
    f, g
    ) (chrominance components)
    Input measures:
    f
    g
    µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))

    View Slide

  62. Color Transfer
    µ

    Input images:
    (
    f, g
    ) (chrominance components)
    Input measures:
    f
    g
    µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))

    View Slide

  63. Color Transfer
    µ

    Input images:
    (
    f, g
    ) (chrominance components)
    Input measures:
    f T
    T f
    g
    ˜
    T g
    µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))

    View Slide

  64. Overview
    • Transportation-like Problems
    • Regularized Transport
    • Barycenters
    • Gromov-Wasserstein

    View Slide

  65. X
    Y
    Gromov-Wasserstein
    Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)

    View Slide

  66. X
    Y
    Gromov-Wasserstein
    dX (
    x,x
    0
    )
    dY (y, y0)
    Def.
    Gromov-Wasserstein distance:
    [Memoli 2011]
    [Sturm 2012]
    GW2
    2
    (dX , µ, dY , ⌫) def.
    =
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0)
    Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)

    View Slide

  67. X
    Y
    Gromov-Wasserstein
    f
    X Y
    Prop.
    GW is a distance on
    mm-spaces/isometries.
    !
    “bending-invariant” objects recognition.
    dX (
    x,x
    0
    )
    dY (y, y0)
    Def.
    Gromov-Wasserstein distance:
    [Memoli 2011]
    [Sturm 2012]
    GW2
    2
    (dX , µ, dY , ⌫) def.
    =
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0)
    Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)

    View Slide

  68. X
    Y
    Gromov-Wasserstein
    f
    X Y
    Prop.
    GW is a distance on
    mm-spaces/isometries.
    !
    need for a fast approximate solver.
    !
    “bending-invariant” objects recognition.
    ! QAP: NP-hard in general.
    dX (
    x,x
    0
    )
    dY (y, y0)
    Def.
    Gromov-Wasserstein distance:
    [Memoli 2011]
    [Sturm 2012]
    GW2
    2
    (dX , µ, dY , ⌫) def.
    =
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0)
    Metric-measure spaces (X, Y ): (dX , µ), (dY , ⌫)

    View Slide

  69. Entropic Gromov-Wasserstein
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0) +
    "
    KL(

    |
    ⇡0)
    Entropic regularization:
    (?)

    View Slide

  70. Entropic Gromov-Wasserstein
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0) +
    "
    KL(

    |
    ⇡0)
    DX
    def.
    = (
    dX (
    xi, x
    0
    i
    ))i,i0
    DY
    def.
    = (dY (yj, y0
    j
    ))j,j0
    Entropic regularization:
    Discrete spaces:
    min
    ⇡2⇧(µ,⌫)
    KL(⇡|K⇡)
    (?) ,
    (?)
    K⇡
    def.
    = e2
    "
    DX
    ⇥⇡⇥DY

    View Slide

  71. Entropic Gromov-Wasserstein
    min
    ⇡2⇧(µ,⌫)
    Z
    X2⇥Y 2
    |
    dX(
    x, x
    0)
    dY (
    y, y
    0)|2d

    (
    x, y
    )d

    (
    x
    0
    , y
    0) +
    "
    KL(

    |
    ⇡0)
    DX
    def.
    = (
    dX (
    xi, x
    0
    i
    ))i,i0
    DY
    def.
    = (dY (yj, y0
    j
    ))j,j0
    Fixed point iteration:
    ⇡`+1
    def.
    = min
    ⇡2⇧(µ,⌫) KL(
    ⇡|K⇡` ) = Sinkhorn(
    K⇡` )
    Entropic regularization:
    Discrete spaces:
    min
    ⇡2⇧(µ,⌫)
    KL(⇡|K⇡)
    (?) ,
    (?)
    Prop.
    ⇡` converges to a stationary point of (
    ?
    ).
    K⇡
    def.
    = e2
    "
    DX
    ⇥⇡⇥DY

    View Slide

  72. Applications of GW: Shapes Analysis
    phine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    Source Targets
    Figure 1: Entropic GW can find correspondences between a source
    surface (left) and a surface with similar structure, a surface with
    shared semantic structure, a noisy 3D point cloud, an icon, and a
    hand drawing. Each fuzzy map was computed using the same code.
    In this paper, we propose a new correspondence algorithm that
    minimizes distortion of long- and short-range distances alike. We
    study an entropically-regularized version of the Gromov-Wasserstein
    (GW) mapping objective function from [M´
    emoli 2011] measuring
    the distortion of geodesic distances. The optimizer is a probabilistic
    matching expressed as a “fuzzy” correspondence matrix in the style
    of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of
    the correspondence via the weight of the entropic regularizer.
    0 0.02 0.04
    0
    0.02
    0
    0.02
    Teddies
    Humans
    Four-legged
    Armadillo
    Figure 15: MDS embedding of four classes from SHREC dataset.
    0 0.5 1
    0
    0.5
    1
    1
    5
    10
    15
    20
    25
    30
    35
    40
    45
    PCA 1
    PCA 2
    Figure 16: Recovery of galloping horse sequence.
    0 is the base shape) as a feature vector for shape i. We reproduce
    the result presented in the work of Rustamov et al., recovering
    the circular structure of meshes from a galloping horse animation
    sequence (Figure 16). Unlike Rustamov et al., however, our method
    does not require ground truth maps between shapes as input.
    5.2 Supervised Matching
    An important feature of a matching tool is the ability to incorporate
    user input, e.g. ground truth matches of points or regions. In the
    GWα framework, one way to enforce these constraints is to provide
    a stencil S specifying a sparsity pattern for the map Γ. Incorporating
    Figure 18: Mapping a set of 185 images onto a two shapes while
    preserving color similarity. (Images from Flickr public domain collection.)
    Rn0×n0
    +
    and D ∈ Rn×n
    +
    we are given symmetric weight matrices
    W0 ∈ Rn0×n0
    +
    and W ∈ Rn×n
    +
    . We could solve a weighted
    version of the GWα matching problem (3) that prioritizes maps
    preserving distances corresponding to large W values:
    min
    Γ∈M
    ijkℓ
    (D0ij −Dkℓ
    )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ
    . (8)
    For instance, (W0, W) might contain confidence values expressing
    the quality of the entries of (D0, D). Or, W0, W could take values
    in {ε, 1} reducing the weight of distances that are unknown or do
    not need to be preserved by Γ.
    Following the same simplifications as §3.1, we can optimize this
    objective by minimizing ⟨Γ, ΛW
    (Γ)⟩, where
    ΛW
    (Γ) :=
    1
    2
    [D∧2
    0
    ⊗ W0][[µ0
    ]]Γ[[µ]]W
    − [D0 ⊗ W0][[µ0
    ]]Γ[[µ]][D ⊗ W]
    +
    1
    2
    W0[[µ0
    ]]Γ[[µ]][D∧2 ⊗ W]
    Use
    T
    to define registration between:
    Colors
    distribution
    Shape
    Shape
    Shape

    View Slide

  73. Applications of GW: Shapes Analysis
    phine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    Source Targets
    Figure 1: Entropic GW can find correspondences between a source
    surface (left) and a surface with similar structure, a surface with
    shared semantic structure, a noisy 3D point cloud, an icon, and a
    hand drawing. Each fuzzy map was computed using the same code.
    In this paper, we propose a new correspondence algorithm that
    minimizes distortion of long- and short-range distances alike. We
    study an entropically-regularized version of the Gromov-Wasserstein
    (GW) mapping objective function from [M´
    emoli 2011] measuring
    the distortion of geodesic distances. The optimizer is a probabilistic
    matching expressed as a “fuzzy” correspondence matrix in the style
    of [Kim et al. 2012; Solomon et al. 2012]; we control sharpness of
    the correspondence via the weight of the entropic regularizer.
    0 0.02 0.04
    0
    0.02
    0
    0.02
    Teddies
    Humans
    Four-legged
    Armadillo
    Figure 15: MDS embedding of four classes from SHREC dataset.
    0 0.5 1
    0
    0.5
    1
    1
    5
    10
    15
    20
    25
    30
    35
    40
    45
    PCA 1
    PCA 2
    Figure 16: Recovery of galloping horse sequence.
    0 is the base shape) as a feature vector for shape i. We reproduce
    the result presented in the work of Rustamov et al., recovering
    the circular structure of meshes from a galloping horse animation
    sequence (Figure 16). Unlike Rustamov et al., however, our method
    does not require ground truth maps between shapes as input.
    5.2 Supervised Matching
    An important feature of a matching tool is the ability to incorporate
    user input, e.g. ground truth matches of points or regions. In the
    GWα framework, one way to enforce these constraints is to provide
    a stencil S specifying a sparsity pattern for the map Γ. Incorporating
    Figure 18: Mapping a set of 185 images onto a two shapes while
    preserving color similarity. (Images from Flickr public domain collection.)
    Rn0×n0
    +
    and D ∈ Rn×n
    +
    we are given symmetric weight matrices
    W0 ∈ Rn0×n0
    +
    and W ∈ Rn×n
    +
    . We could solve a weighted
    version of the GWα matching problem (3) that prioritizes maps
    preserving distances corresponding to large W values:
    min
    Γ∈M
    ijkℓ
    (D0ij −Dkℓ
    )2ΓikΓjℓW0ijWjℓµ0iµ0jµkµℓ
    . (8)
    For instance, (W0, W) might contain confidence values expressing
    the quality of the entries of (D0, D). Or, W0, W could take values
    in {ε, 1} reducing the weight of distances that are unknown or do
    not need to be preserved by Γ.
    Following the same simplifications as §3.1, we can optimize this
    objective by minimizing ⟨Γ, ΛW
    (Γ)⟩, where
    ΛW
    (Γ) :=
    1
    2
    [D∧2
    0
    ⊗ W0][[µ0
    ]]Γ[[µ]]W
    − [D0 ⊗ W0][[µ0
    ]]Γ[[µ]][D ⊗ W]
    +
    1
    2
    W0[[µ0
    ]]Γ[[µ]][D∧2 ⊗ W]
    0 0.02 0.04
    0
    0.02
    0
    0.02
    Te
    Hu
    Fo
    Ar
    Figure 1: The database that has been used, divide
    MDS in 3-D
    Use
    T
    to define registration between:
    Colors
    distribution
    Shape
    Shape
    Shape
    Geodesic distances
    GW distances MDS
    Vizualization
    Shapes
    (Xs)s 0 0.02 0.04
    0
    0.02
    0
    0.02
    Teddies
    Humans
    Four-legged
    Armadillo
    Figure 15: MDS embedding of four classes from SHREC dataset.
    0
    0.5
    1
    1
    5
    10
    15
    20
    25
    30
    35
    40
    45
    PCA 2
    Figure 18: Mapping a set of 185 images onto a two shapes while
    preserving color similarity. (Images from Flickr public domain collection.)
    Rn0×n0
    +
    and D ∈ Rn×n
    +
    we are given symmetric weight matrices
    W0 ∈ Rn0×n0
    +
    and W ∈ Rn×n
    +
    . We could solve a weighted
    version of the GWα matching problem (3) that prioritizes maps
    preserving distances corresponding to large W values:
    MDS in 2-D

    View Slide

  74. Applications of GW: Quantum Chemistry
    Regression problem:
    ! f
    by solving DFT approximation is too costly.

    View Slide

  75. Applications of GW: Quantum Chemistry
    Regression problem:
    ! f
    by solving DFT approximation is too costly.
    [Rupp et al 2012]
    2
    6
    6
    6
    6
    4
    3
    7
    7
    7
    7
    5

    View Slide

  76. Applications of GW: Quantum Chemistry
    Regression problem:
    ! f
    by solving DFT approximation is too costly.
    GW-interpolation:
    [Rupp et al 2012]
    2
    6
    6
    6
    6
    4
    3
    7
    7
    7
    7
    5

    View Slide

  77. xi
    yj
    Optimal transport
    Registered
    spaces
    Un-registered
    spaces
    Gromov-Wasserstein
    Applications:
    – quantum chemistry
    – shapes
    Entropic Metric Alignment for Corresponde
    Justin Solomon∗
    MIT
    Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimi
    Adobe R
    Abstract
    Many shape and image processing tools rely on computation of cor-
    respondences between geometric domains. Efficient methods that
    stably extract “soft” matches in the presence of diverse geometric
    structures have proven to be valuable for shape retrieval and transfer
    of labels or semantic information. With these applications in mind,
    we present an algorithm for probabilistic correspondence that opti-
    mizes an entropy-regularized Gromov-Wasserstein (GW) objective.
    Built upon recent developments in numerical optimal transportation,
    our algorithm is compact, provably convergent, and applicable to
    any geometric domain expressible as a metric measure matrix. We
    Source
    Figure 1: Entropic G
    surface (left) and a s
    shared semantic stru
    ic Metric Alignment for Correspondence Problems
    n∗ Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    ng tools rely on computation of cor-
    c domains. Efficient methods that
    the presence of diverse geometric
    able for shape retrieval and transfer
    n. With these applications in mind,
    babilistic correspondence that opti-
    omov-Wasserstein (GW) objective.
    n numerical optimal transportation,
    ably convergent, and applicable to
    le as a metric measure matrix. We
    Source Targets
    Figure 1: Entropic GW can find correspondences between
    surface (left) and a surface with similar structure, a surf
    shared semantic structure, a noisy 3D point cloud, an ico
    Conclusion

    View Slide

  78. xi
    yj
    Optimal transport
    Registered
    spaces
    Un-registered
    spaces
    Gromov-Wasserstein
    Applications:
    – quantum chemistry
    – shapes
    Entropic Metric Alignment for Corresponde
    Justin Solomon∗
    MIT
    Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimi
    Adobe R
    Abstract
    Many shape and image processing tools rely on computation of cor-
    respondences between geometric domains. Efficient methods that
    stably extract “soft” matches in the presence of diverse geometric
    structures have proven to be valuable for shape retrieval and transfer
    of labels or semantic information. With these applications in mind,
    we present an algorithm for probabilistic correspondence that opti-
    mizes an entropy-regularized Gromov-Wasserstein (GW) objective.
    Built upon recent developments in numerical optimal transportation,
    our algorithm is compact, provably convergent, and applicable to
    any geometric domain expressible as a metric measure matrix. We
    Source
    Figure 1: Entropic G
    surface (left) and a s
    shared semantic stru
    ic Metric Alignment for Correspondence Problems
    n∗ Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    ng tools rely on computation of cor-
    c domains. Efficient methods that
    the presence of diverse geometric
    able for shape retrieval and transfer
    n. With these applications in mind,
    babilistic correspondence that opti-
    omov-Wasserstein (GW) objective.
    n numerical optimal transportation,
    ably convergent, and applicable to
    le as a metric measure matrix. We
    Source Targets
    Figure 1: Entropic GW can find correspondences between
    surface (left) and a surface with similar structure, a surf
    shared semantic structure, a noisy 3D point cloud, an ico
    Conclusion

    View Slide

  79. xi
    yj
    Optimal transport
    Registered
    spaces
    Un-registered
    spaces
    Gromov-Wasserstein
    Entropy:
    makes the
    problem tractable
    e↵ective!
    Entropy:
    surprisingly
    (GW highly non-convex)
    Applications:
    – quantum chemistry
    – shapes
    Entropic Metric Alignment for Corresponde
    Justin Solomon∗
    MIT
    Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimi
    Adobe R
    Abstract
    Many shape and image processing tools rely on computation of cor-
    respondences between geometric domains. Efficient methods that
    stably extract “soft” matches in the presence of diverse geometric
    structures have proven to be valuable for shape retrieval and transfer
    of labels or semantic information. With these applications in mind,
    we present an algorithm for probabilistic correspondence that opti-
    mizes an entropy-regularized Gromov-Wasserstein (GW) objective.
    Built upon recent developments in numerical optimal transportation,
    our algorithm is compact, provably convergent, and applicable to
    any geometric domain expressible as a metric measure matrix. We
    Source
    Figure 1: Entropic G
    surface (left) and a s
    shared semantic stru
    ic Metric Alignment for Correspondence Problems
    n∗ Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    ng tools rely on computation of cor-
    c domains. Efficient methods that
    the presence of diverse geometric
    able for shape retrieval and transfer
    n. With these applications in mind,
    babilistic correspondence that opti-
    omov-Wasserstein (GW) objective.
    n numerical optimal transportation,
    ably convergent, and applicable to
    le as a metric measure matrix. We
    Source Targets
    Figure 1: Entropic GW can find correspondences between
    surface (left) and a surface with similar structure, a surf
    shared semantic structure, a noisy 3D point cloud, an ico
    Conclusion

    View Slide

  80. xi
    yj
    Optimal transport
    Registered
    spaces
    Un-registered
    spaces
    Gromov-Wasserstein
    Entropy:
    makes the
    problem tractable
    e↵ective!
    Entropy:
    surprisingly
    (GW highly non-convex)
    Applications:
    – quantum chemistry
    – shapes
    Entropic Metric Alignment for Corresponde
    Justin Solomon∗
    MIT
    Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimi
    Adobe R
    Abstract
    Many shape and image processing tools rely on computation of cor-
    respondences between geometric domains. Efficient methods that
    stably extract “soft” matches in the presence of diverse geometric
    structures have proven to be valuable for shape retrieval and transfer
    of labels or semantic information. With these applications in mind,
    we present an algorithm for probabilistic correspondence that opti-
    mizes an entropy-regularized Gromov-Wasserstein (GW) objective.
    Built upon recent developments in numerical optimal transportation,
    our algorithm is compact, provably convergent, and applicable to
    any geometric domain expressible as a metric measure matrix. We
    Source
    Figure 1: Entropic G
    surface (left) and a s
    shared semantic stru
    ic Metric Alignment for Correspondence Problems
    n∗ Gabriel Peyr´
    e
    CNRS & Univ. Paris-Dauphine
    Vladimir G. Kim
    Adobe Research
    Suvrit Sra
    MIT
    ng tools rely on computation of cor-
    c domains. Efficient methods that
    the presence of diverse geometric
    able for shape retrieval and transfer
    n. With these applications in mind,
    babilistic correspondence that opti-
    omov-Wasserstein (GW) objective.
    n numerical optimal transportation,
    ably convergent, and applicable to
    le as a metric measure matrix. We
    Source Targets
    Figure 1: Entropic GW can find correspondences between
    surface (left) and a surface with similar structure, a surf
    shared semantic structure, a noisy 3D point cloud, an ico
    Future works:
    Conclusion

    View Slide