Entropic Regularization of Wasserstein Barycenters
Talks given at the "Optimal Transport Workshop" in Toulouse, updated for a talk at "Workshop on Computational Information Geometry for Image and Signal Processing" in Edinburg.
Numerical Optimal Transport and Applications Gabriel Peyré www.numerical-tours.com Joint works with: Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Justin Solomon
Imaging: Statistical Image Models Source image (X) Style image (Y) Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3
Imaging: Statistical Image Models Input image Modified image Source image (X) Style image (Y) Source image after color transfer J. Rabin Wasserstein Regularization Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Sliced Wasserstein projection of X to style image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3
Other applications: Imaging: Statistical Image Models Input image Modified image Source image (X) Style image (Y) Source image after color transfer J. Rabin Wasserstein Regularization Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Sliced Wasserstein projection of X to style image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X) Style image (Y) Sliced Wasserstein projection of X to style image color statistics Y Source image after color transfer J. Rabin Wasserstein Regularization Colors distribution: each pixel point in R3 Texture synthesis, segmentation, . . . Classification, clustering . . . Surface processing, reflectance modeling . . .
Optimal Transport Discrete densities: Histograms: Couplings: µ = P i p i xi xi 2 Rd q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q ⌃N def. = p 2 RN + ; P i pi = 1
Optimal Transport Discrete densities: Histograms: Couplings: µ = P i p i xi xi 2 Rd Ground cost: c 2 (R +)N⇥N . p-Wasserstein transport: ci,j = || xi xj ||p q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q ⌃N def. = p 2 RN + ; P i pi = 1
Optimal Transport Discrete densities: Histograms: Couplings: µ = P i p i xi xi 2 Rd Ground cost: c 2 (R +)N⇥N . p-Wasserstein transport: ci,j = || xi xj ||p q p ⇡ C(p, q) def. = ⇡ 2 (R +)N⇥N ; ⇡1 = p, ⇡T 1 = q W(p, q) def. = min nP i,j ⇡i,jci,j ; ⇡ 2 C(p, q)o Optimal transport distance: ⌃N def. = p 2 RN + ; P i pi = 1
Algorithms Arbitrary discrete measures: ⇡? 2 argmin ⇡2C(p,q) hc, ⇡i µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex
Algorithms Arbitrary discrete measures: ⇡? 2 argmin ⇡2C(p,q) hc, ⇡i Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫
Algorithms Arbitrary discrete measures: ⇡? 2 argmin ⇡2C(p,q) hc, ⇡i Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) 1-D and convex cost: ci,j = | xi xj |p , p > 1. ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫ sorting the values, O(N log(N)) operations. µ ⌫
Algorithms Arbitrary discrete measures: ⇡? 2 argmin ⇡2C(p,q) hc, ⇡i Point clouds: N = P, pi = qj = 1/N. W(p, q) = min 2 PermN P i ci, (i) 1-D and convex cost: ci,j = | xi xj |p , p > 1. ! Hungarian/auction algorithms, complexity O(N3). µ = P N i =1 p i xi , ⌫ = P P j =1 q j yj ! Linear program interior points (polynomial) transportation simplex µ ⌫ sorting the values, O(N log(N)) operations. µ ⌫ Need for fast approximate algorithms for generic c .
Kullback-Leibler Projections KL( ⇡|⇠ ) def. = P i,j ⇡i,j log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C
Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ ) W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.
Wasserstein Barycenters µ µ1 µ3 W2 (µ1 , µ ) W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.
µ exists and is unique. Theorem: if µ1 does not vanish on small sets, Wasserstein Barycenters [Agueh, Carlier, 2010] µ µ1 µ3 W2 (µ1 , µ ) W 2 (µ 2 ,µ ) W2 (µ3 ,µ ) µ2 µ? 2 argmin µ P k k W2(µk, µ) Barycenters of measures ( µk)k: P k k = 1 If µ k = xk then µ? = P k kxk Generalizes Euclidean barycenter: For µ = P i p i xi , ⌫ = P j q j yj , W2(µ, ⌫) = W(p, q) for ci,j = || xi yj ||2 W2 def. = Wasserstein distance for measures.
Entropic Wasserstein Barycenters In term of couplings: 8 k, p = ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } ⇠ = e c Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p}
Entropic Wasserstein Barycenters In term of couplings: Proposition: p = Q k (⇡k 1) k 8 k, p = ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } ⇠ = e c Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p} ProjC1 ( ⇡k)k = ✓ ⇡k diag ✓ pk ⇡T k 1 ◆◆ k ProjC2 ( ⇡k)k = ✓ diag ✓ p ⇡k 1 ◆ ⇡k ◆ k
Entropic Wasserstein Barycenters In term of couplings: Proposition: p = Q k (⇡k 1) k 8 k, p = ⇡k 1 where min { P k kKL(⇡k |⇠) ; (⇡k)k 2 C1 \ C2 } Sinkhorn-like algorithm: ⇠ = e c ( ⇡(2`+1) k )k = ProjC1 ( ⇡(2`) k )k ( ⇡(2`+2) k )k = ProjC2 ( ⇡(2`+1) k )k 8 k, ⇡(0) k = ⇠ Barycenter: min p2⌃N P k kW (pk, p) p p1 p2 p3 C1 def. = (⇡k)k ; 8 k, ⇡T k 1 = pk C2 def. = {(⇡k)k ; 9p, 8 k, ⇡k 1 = p} ProjC1 ( ⇡k)k = ✓ ⇡k diag ✓ pk ⇡T k 1 ◆◆ k ProjC2 ( ⇡k)k = ✓ diag ✓ p ⇡k 1 ◆ ⇡k ◆ k
Raw image sequence Color Harmonization . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing
Raw image sequence Compute Wasserstein barycenter Project on the barycenter Color Harmonization . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; Raw image sequence J. Rabin – GREYC, University of Caen Approximate Wasserstein Metric for Texture Synthesis and Mixing erstein Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion olor transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; stein Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion lor transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter; in Barycenter Sliced Wasserstein Barycenter Experimental results Applications Conclusion or transfer Color harmonization of several images . Step 1: compute Sliced-Wasserstein Barycenter of color statistics; . Step 2: compute Sliced-Wasserstein projection of each image onto the Barycenter;
Optimal Transport on Surfaces Ground cost: ci,j = dM(xi, xj) 2 . Triangulated mesh: M. Geodesic distance: dM. Level sets xi d ( xi, ·) Computing c (Fast-Marching): N2 log( N ) ! too costly.
Entropic Transport on Surfaces Heat equation on M: Sinkhorn kernel: Theorem: [Varadhan] log( u ) !0 ! d2 M @ u ( x, ·) = Mu ( x, ·) , u0( x, ·) = x ⇠ = e d2 M ⇡ ut ⇡ Id L 1 M L
Conclusion Source image (X) Style image (Y) Source ima J. Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy!
Conclusion EMD Entropy Discrete analog: Cuturi, NIPS 2013 Entropic regularization for optimal transport. Source image (X) Style image (Y) Source ima J. Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy!
Conclusion EMD Entropy Discrete analog: Cuturi, NIPS 2013 Entropic regularization for optimal transport. Source image (X) Style image (Y) Source ima J. Rabin Wasserstein Regu Histogram features in imaging and machine learning. ! histograms are now trendy! Barycenters in Wasserstein space: Figure 4: Simulation results with focal random signals generated in are