Entropic Regularization of Wasserstein Barycenters

Numerical Optimal Transport and Applications Gabriel Peyré www.numerical-tours.com Joint works
with: Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Justin Solomon

Imaging: Statistical Image Models Source image (X) Style image (Y)
Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3

Imaging: Statistical Image Models Input image Modiﬁed image Source image
(X) Style image (Y) Source image after color transfer J. Rabin Wasserstein Regularization Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Sliced Wasserstein projection of X to style image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3

Other applications: Imaging: Statistical Image Models Input image Modified image
Source image (X) Style image (Y) Source image after color transfer J. Rabin Wasserstein Regularization Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Sliced Wasserstein projection of X to style image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3 Texture synthesis, segmentation, . . . Classification, clustering . . . Surface processing, reflectance modeling . . .

Machine Learning: Bag of Features Image descriptors: Gradient distribution Histogram
of features

Machine Learning: Bag of Features Image descriptors: Text descriptors: Gradient
distribution Histogram of features

Overview • Optimal Transport • Regularized Transport • Wasserstein Barycenters
• Heat Kernel Approximation

Optimal Transport Discrete densities: Histograms: µ = P i p
i xi xi 2 Rd ⌃N def. = p 2 RN + ; P i pi = 1

Optimal Transport Discrete densities: Histograms: Couplings: µ = P i
p i xi xi 2 Rd q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q ⌃N def. = p 2 RN + ; P i pi = 1

p i xi xi 2 Rd Ground cost: c 2 (R +)N⇥N . p-Wasserstein transport: ci,j = || xi xj ||p q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q ⌃N def. = p 2 RN + ; P i pi = 1

p i xi xi 2 Rd Ground cost: c 2 (R +)N⇥N . p-Wasserstein transport: ci,j = || xi xj ||p q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q W(p, q) def. = min nP i,j ⇡i,jci,j ; ⇡ 2 C(p, q)o Optimal transport distance: ⌃N def. = p 2 RN + ; P i pi = 1

Algorithms Arbitrary discrete measures: ⇡? 2 argmin ⇡2C(p,q) hc, ⇡i
µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj

µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex

Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫

Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) 1-D and convex cost: ci,j = | xi xj |p , p > 1. ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫ sorting the values, O(N log(N)) operations. µ ⌫

Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) 1-D and convex cost: ci,j = | xi xj |p , p > 1. ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫ sorting the values, O(N log(N)) operations. µ ⌫ Need for fast approximate algorithms for generic c .

• Heat Kernel Approximation • Wasserstein Gradient Flows

Entropy Regularized Transport (minus) Entropy: E ( ⇡ ) def.
= X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j)

Entropy Regularized Transport (minus) Entropy: Regularized distance: E ( ⇡
) def. = X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j) W (p, q) def. = min {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} ⇡ def. = argmin {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} [Schrodinger 1931] Used in economy [Galichon Salani´ e 2008] and machine learning [Cuturi 2013]

Entropy Regularized Transport (minus) Entropy: Regularized distance: ⇡ c E
( ⇡ ) def. = X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j) W (p, q) def. = min {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} ⇡ def. = argmin {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} [Schrodinger 1931] Used in economy [Galichon Salani´ e 2008] and machine learning [Cuturi 2013]

The Impact of Regularization Proposition: ⇡ !0 ! argmin ⇡2S
E(⇡) W (p, q) !0 ! W(p, q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)}

The Impact of Regularization Proposition: ⇡ !+1 ! pqT ⇡
!0 ! argmin ⇡2S E(⇡) W (p, q) !0 ! W(p, q) 1 W (p, q) !+1 ! E(p) + E(q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)}

The Impact of Regularization Proposition: ⇡ !+1 ! pqT ⇡
!0 ! argmin ⇡2S E(⇡) W (p, q) !0 ! W(p, q) 1 W (p, q) !+1 ! E(p) + E(q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)} EMD Entrop ⇡ p q

Kullback-Leibler Projections KL( ⇡|⇠ ) def. = P i,j ⇡i,j
log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence:

Kullback-Leibler Projections KL( ⇡|⇠ ) def. = P i,j ⇡i,j
log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Kullback-Leibler Projections W (p, q) = min {KL(⇡|⇠) ; ⇡
2 C(p, q)} ⇡ = ProjC(p,q)( ⇠ ) def. = argmin { KL( ⇡|⇠ ) ; ⇡ 2 C ( p, q ) } Proposition: KL( ⇡|⇠ ) def. = P i,j ⇡i,j log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Kullback-Leibler Projections W (p, q) = min {KL(⇡|⇠) ; ⇡
2 C(p, q)} Constraint splitting: q p ⇡ C(p, q) = C1 \ C2 ⇢ C1 = ⇡ 2 (R +)N⇥N ; ⇡1 = p , C2 = ⇡ 2 (R +)N⇥N ; ⇡T 1 = q . ⇡ = ProjC(p,q)( ⇠ ) def. = argmin { KL( ⇡|⇠ ) ; ⇡ 2 C ( p, q ) } Proposition: KL( ⇡|⇠ ) def. = P i,j ⇡i,j log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Sinkhorn / IPFP Algorithm Iterative Bregman projections: ⇡(0) = ⇠
⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) [Bregman 1957]

⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) Theorem: ⇡(`) ! ProjC1 \...\CK ( ⇠ ) [Bregman 1957] If {Ci }i are a ne sets,

⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) Theorem: ⇡(`) ! ProjC1 \...\CK ( ⇠ ) Fixed marginals: Proposition: ProjC1 ( ⇡ ) = diag ⇣ p ⇡1 ⌘ ⇡ ProjC2 ( ⇡ ) = ⇡ diag ⇣ q ⇡T 1 ⌘ ( C1 def. = {⇡ ; ⇡1 = p} , C2 def. = ⇡ ; ⇡T 1 = q . [Bregman 1957] If {Ci }i are a ne sets,

Diagonal Scaling, Fast Implementation Sinkhorn algorithm: ⇡(0) = ⇠ [Sinkhorn
1967] [Deming,Stephan 1940] ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

Diagonal Scaling, Fast Implementation Sinkhorn algorithm: ! Only matrix-vector multiplications.
⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

Diagonal Scaling, Fast Implementation Sinkhorn algorithm: ! Only matrix-vector multiplications.
! Highly parallelizable. ⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

Translation-invariant Ground Metrics Assuming ci,j = 'i j on a
discrete grid (e.g. periodic b.c.). ⇠v =  ? v where  def. = e '/

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. ⇠v =  ? v where  def. = e '/

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. v(`+1) = q ⇣  ? ⇣ p  ? v(`) 1 ⌘⌘ 1 Convolutive Sinkhorn: ⇠v =  ? v where  def. = e '/ a b def. = ( aibi)i, ? def. = convolution ! ⇠v computed in O ( N log( N )) operations (FFT, IIR approximation)

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. v(`+1) = q ⇣  ? ⇣ p  ? v(`) 1 ⌘⌘ 1 Convolutive Sinkhorn: ⇠v =  ? v where  def. = e '/ a b def. = ( aibi)i, ? def. = convolution p q ` ⇡(`) ! ⇠v computed in O ( N log( N )) operations (FFT, IIR approximation)

• Heat Kernel Approximation

Wasserstein Barycenters For µ = P i p i xi
, ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.

Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )
W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.

Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ )
W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.

µ exists and is unique. Theorem: if µ1 does not
vanish on small sets, Wasserstein Barycenters [Agueh, Carlier, 2010] µ µ1 µ3 W2 (µ1 , µ ) W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.

Entropic Wasserstein Barycenters Barycenter: min p2⌃N P k kW (pk,
p) p p1 p2 p3

Entropic Wasserstein Barycenters In term of couplings: 8 k, p
= ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } ⇠ = e c Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p}

Entropic Wasserstein Barycenters In term of couplings: Proposition: p =
Q k (⇡k 1) k 8 k, p = ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } ⇠ = e c Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p} ProjC1 ( ⇡k)k = ✓ ⇡k diag ✓ pk ⇡T k 1 ◆◆ k ProjC2 ( ⇡k)k = ✓ diag ✓ p ⇡k 1 ◆ ⇡k ◆ k

Entropic Wasserstein Barycenters In term of couplings: Proposition: p =
Q k (⇡k 1) k 8 k, p = ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } Sinkhorn-like algorithm: ⇠ = e c ( ⇡(2`+1) k )k = ProjC1 ( ⇡(2`) k )k ( ⇡(2`+2) k )k = ProjC2 ( ⇡(2`+1) k )k 8 k, ⇡(0) k = ⇠ Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p} ProjC1 ( ⇡k)k = ✓ ⇡k diag ✓ pk ⇡T k 1 ◆◆ k ProjC2 ( ⇡k)k = ✓ diag ✓ p ⇡k 1 ◆ ⇡k ◆ k

Barycenter of 3 Shapes p1 p2 p3

Barycenter of 3-D Volumes

Color Transfer µ ⌫ Input images: ( f, g )
(chrominance components) Input measures: f g µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))

Color Transfer µ ⌫ Input images: ( f, g )
(chrominance components) Input measures: f T T f g ˜ T g µ(A) = U(f 1(A)), ⌫(A) = U(g 1(A))

Raw image sequence Color Harmonization . Step 1: compute Sliced-Wasserstein
Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing

Raw image sequence Compute Wasserstein barycenter Project on the barycenter
Color Harmonization . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing erstein Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion olor transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; stein Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion lor transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; in Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion or transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter;

• Heat Kernel Approximation • Wasserstein Gradient Flows

Optimal Transport on Surfaces Triangulated mesh: M. Geodesic distance: dM.

Optimal Transport on Surfaces Ground cost: ci,j = dM(xi, xj)
2 . Triangulated mesh: M. Geodesic distance: dM. Level sets xi d ( xi, ·)

Optimal Transport on Surfaces Ground cost: ci,j = dM(xi, xj)
2 . Triangulated mesh: M. Geodesic distance: dM. Level sets xi d ( xi, ·) Computing c (Fast-Marching): N2 log( N ) ! too costly.

Entropic Transport on Surfaces Heat equation on M: @ u
( x, ·) = Mu ( x, ·) , u0( x, ·) = x

Entropic Transport on Surfaces Heat equation on M: Sinkhorn kernel:
Theorem: [Varadhan] log( u ) !0 ! d2 M @ u ( x, ·) = Mu ( x, ·) , u0( x, ·) = x ⇠ = e d2 M ⇡ ut ⇡ Id L 1 M L

Barycenter on a Surface 1 p0 p1

Barycenter on a Surface 1 p0 p1 p0 p2 p3
p4 p6 p1 = (1, . . . , 1)/6

MRI Data Procesing [with A. Gramfort] ariational Wasserstein Problems Labels
L2 barycenter W barycenter Ground cost ci,j = dM(xi, xj): geodesic on cortical surface.

Conclusion Source image (X) Style image (Y) Source ima J.
Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy!

Conclusion EMD Entropy Discrete analog: Cuturi, NIPS 2013 Entropic regularization
for optimal transport. Source image (X) Style image (Y) Source ima J. Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy!

Conclusion EMD Entropy Discrete analog: Cuturi, NIPS 2013 Entropic regularization
for optimal transport. Source image (X) Style image (Y) Source ima J. Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy! Barycenters in Wasserstein space: Figure 4: Simulation results with focal random signals generated in are

Entropic Regularization of Wasserstein Barycenters

Entropic Regularization of Wasserstein Barycenters

More Decks by Gabriel Peyré

Other Decks in Research

Featured

Transcript