Discrete Entropic Wasserstein Flows

Discrete Entropic Wasserstein Flows Gabriel Peyré www.numerical-tours.com

Overview • Regularized Transport • Regularized JKO Flows • Dykstra’s
Algorithm • Heat Kernel Approximation

Entropy Regularized Transport (minus) Entropy: E ( ⇡ ) def.
= X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j)

Entropy Regularized Transport (minus) Entropy: Regularized distance: E ( ⇡
) def. = X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j) W (p, q) def. = min {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} ⇡ def. = argmin {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} [Schrodinger 1931] Used in economy [Galichon Salani´ e 2008] and machine learning [Cuturi 2013]

Entropy Regularized Transport (minus) Entropy: Regularized distance: ⇡ c E
( ⇡ ) def. = X i,j ⇡i,j(log( ⇡i,j) 1) + ◆R+ ( ⇡i,j) W (p, q) def. = min {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} ⇡ def. = argmin {h⇡, ci + E(⇡) ; ⇡ 2 C(p, q)} [Schrodinger 1931] Used in economy [Galichon Salani´ e 2008] and machine learning [Cuturi 2013]

The Impact of Regularization Proposition: ⇡ !0 ! argmin ⇡2S
E(⇡) W (p, q) !0 ! W(p, q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)}

The Impact of Regularization Proposition: ⇡ !+1 ! pqT ⇡
!0 ! argmin ⇡2S E(⇡) W (p, q) !0 ! W(p, q) 1 W (p, q) !+1 ! E(p) + E(q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)}

The Impact of Regularization Proposition: ⇡ !+1 ! pqT ⇡
!0 ! argmin ⇡2S E(⇡) W (p, q) !0 ! W(p, q) p q 1 W (p, q) !+1 ! E(p) + E(q) S def. = argmin {h⇡, ci ; ⇡ 2 C(p, q)} ⇡

Kullback-Leibler Projections KL( ⇡|⇠ ) def. = P i,j ⇡i,j
log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence:

Kullback-Leibler Projections KL( ⇡|⇠ ) def. = P i,j ⇡i,j
log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Kullback-Leibler Projections W (p, q) = min {KL(⇡|⇠) ; ⇡
2 C(p, q)} ⇡ = ProjC(p,q)( ⇠ ) def. = argmin { KL( ⇡|⇠ ) ; ⇡ 2 C ( p, q ) } Proposition: KL( ⇡|⇠ ) def. = P i,j ⇡i,j log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Kullback-Leibler Projections W (p, q) = min {KL(⇡|⇠) ; ⇡
2 C(p, q)} Constraint splitting: q p ⇡ C(p, q) = C1 \ C2 ⇢ C1 = ⇡ 2 (R +)N⇥N ; ⇡1 = p , C2 = ⇡ 2 (R +)N⇥N ; ⇡T 1 = q . ⇡ = ProjC(p,q)( ⇠ ) def. = argmin { KL( ⇡|⇠ ) ; ⇡ 2 C ( p, q ) } Proposition: KL( ⇡|⇠ ) def. = P i,j ⇡i,j log ⇣ ⇡i,j ⇠i,j ⌘ + ⇠i,j ⇡i,j KL divergence: where ⇠ = e c One has: h⇡, ci + E(⇡) = KL(⇡|⇠) + C

Sinkhorn / IPFP Algorithm Iterative Bregman projections: ⇡(0) = ⇠
⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) [Bregman 1957]

⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) Theorem: ⇡(`) ! ProjC1 \...\CK ( ⇠ ) [Bregman 1957] If {Ci }i are a ne sets,

⇠ ⇡(1) ⇡(2) ⇡(3) ⇡(4) ⇡(5) ⇡ ⇡(`+1) = ProjC`%K ( ⇡(`) ) Theorem: ⇡(`) ! ProjC1 \...\CK ( ⇠ ) Fixed marginals: Proposition: ProjC1 ( ⇡ ) = diag ⇣ p ⇡1 ⌘ ⇡ ProjC2 ( ⇡ ) = ⇡ diag ⇣ q ⇡T 1 ⌘ ( C1 def. = {⇡ ; ⇡1 = p} , C2 def. = ⇡ ; ⇡T 1 = q . [Bregman 1957] If {Ci }i are a ne sets,

Diagonal Scaling, Fast Implementation Sinkhorn algorithm: ⇡(0) = ⇠ [Sinkhorn
1967] [Deming,Stephan 1940] ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

Diagonal Scaling, Fast Implementation Sinkhorn algorithm: ! Only matrix-vector multiplications.
⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

! Highly parallelizable. ⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1)

! Highly parallelizable. ⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1) ! Extension to barycenters and more [Benamou et al 2015].

! Highly parallelizable. ⇡(0) = ⇠ [Sinkhorn 1967] [Deming,Stephan 1940] v(0) = 1 Sinkhorn, revisited: u(`) = p ⇠v(`) v(`+1) = q ⇠T u(`) Proposition: ⇡ = diag(u )⇠ diag(v ) where ⇠ = e c . ⇡(`) = diag(u(`))⇠ diag(v(`)) ⇡(2`+1) = diag(p/⇡(2`)1)⇡(2`) ⇡(2`+2) = ⇡(2`+1) diag(q/⇡(2`+1),T 1) ! Extension to Riemannian manifolds [Solomon et al 2015] ! Extension to barycenters and more [Benamou et al 2015].

Translation-invariant Ground Metrics Assuming ci,j = 'i j on a
discrete grid (e.g. periodic b.c.). ⇠v =  ? v where  def. = e '/

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. ⇠v =  ? v where  def. = e '/

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. v(`+1) = q ⇣  ? ⇣ p  ? v(`) 1 ⌘⌘ 1 Convolutive Sinkhorn: ⇠v =  ? v where  def. = e '/ a b def. = ( aibi)i, ? def. = convolution ! ⇠v computed in O ( N log( N )) operations (FFT, IIR approximation)

discrete grid (e.g. periodic b.c.). Example: ci,j = || xi xj ||2,  = Gaussian ﬁlter. v(`+1) = q ⇣  ? ⇣ p  ? v(`) 1 ⌘⌘ 1 Convolutive Sinkhorn: ⇠v =  ? v where  def. = e '/ a b def. = ( aibi)i, ? def. = convolution p q ` ⇡(`) ! ⇠v computed in O ( N log( N )) operations (FFT, IIR approximation)

JKO Flow - Theory Implicit Euler step: [Jordan, Kinderlehrer, Otto
1998] pt+1 = argmin p2⌃N W(pt, p) + ⌧f(p)

JKO Flow - Theory Implicit Euler step: Formal limit ⌧
! 0: [Jordan, Kinderlehrer, Otto 1998] @tp = div (pr(f0(p))) pt+1 = argmin p2⌃N W(pt, p) + ⌧f(p)

JKO Flow - Theory f(p) = R pw (advection) Implicit
Euler step: Formal limit ⌧ ! 0: @tp = div(prw) Evolution pt Evolution pt [Jordan, Kinderlehrer, Otto 1998] @tp = div (pr(f0(p))) pt+1 = argmin p2⌃N W(pt, p) + ⌧f(p) Potential cos( w ) Potential cos( w )

JKO Flow - Theory f ( p ) = R
p log( p ) f(p) = R pw (advection) (heat di↵usion) Implicit Euler step: Formal limit ⌧ ! 0: @tp = div(prw) @tp = p Evolution pt Evolution pt [Jordan, Kinderlehrer, Otto 1998] @tp = div (pr(f0(p))) pt+1 = argmin p2⌃N W(pt, p) + ⌧f(p) Potential cos( w ) Potential cos( w )

JKO Flow - Theory f ( p ) = R
p log( p ) f(p) = R pw (advection) (heat di↵usion) (non-linear di↵usion) Implicit Euler step: Formal limit ⌧ ! 0: @tp = div(prw) @tp = p @tp = pm Evolution pt Evolution pt [Jordan, Kinderlehrer, Otto 1998] @tp = div (pr(f0(p))) f(p) = 1 m 1 R pm pt+1 = argmin p2⌃N W(pt, p) + ⌧f(p) Potential cos( w ) Potential cos( w )

JKO Flow - Numerics Pros: ! intrinsic discretization (mass conservation).
! deals with non-smooth energies. ! (sometimes) exposes displacement convexity. ! no CFL condition (implicit stepping). (?) pt+1 = argminp W(pt, p) + ⌧f(p)

! deals with non-smooth energies. ! (sometimes) exposes displacement convexity. ! no CFL condition (implicit stepping). Cons: ! ( ? ) is hard to solve . . . (?) pt+1 = argminp W(pt, p) + ⌧f(p)

! deals with non-smooth energies. ! (sometimes) exposes displacement convexity. ! no CFL condition (implicit stepping). Cons: ! ( ? ) is hard to solve . . . (?) [Kinderlehrer, Walkington 1999] [Blanchet, Calvez, Carrillo 2008] [Agueh, Bowles 2013] [Matthes and Osberger 2014] [Carrillo and Moll 2009] [Benamou, Carlier, Merigot, Oudet 2014] [Westdickenberg and Wilkening 2010] [Budd, Cullen and Walsh 2012] [Burger, Carrillo, Wolfram 2010] [Carrillo, Chertock and Huang 2014] Eulerian Lagrangian (moving meshes) (warpings) (particules system) (ﬁnite volumes) pt+1 = argminp W(pt, p) + ⌧f(p) 1-D (gradient convex func) (linearization) [Burger, Franeka, Schonlieb 2012] (interior point)

Entropic JKO and KL Optimization min p2⌃N W (q, p)
+ ⌧f(p) ⇠ def. = e c/ 2 RN⇥N +,⇤ min ⇡ KL(⇡|⇠) + '1(⇡) + '2(⇡) () '2(⇡) def. = ⌧ f(⇡1) '1(⇡) = ◆Cq (⇡) Cq def. = ⇡ ; ⇡T 1 = q p = ⇡1

Dykstra’s Algorithm (?) min ⇡ KL(⇡|⇠) + '1(⇡) + '2(⇡)

Dykstra’s Algorithm Proximal operator: (?) min ⇡ KL(⇡|⇠) + '1(⇡)
+ '2(⇡) Proxg( ⇡ ) def. = argmin˜ ⇡ KL(˜ ⇡|⇡ ) + g (˜ ⇡ )

Dykstra’s Algorithm z(0) = z( 1) def. = 1 Proximal
operator: Initialization: Iterations: (?) min ⇡ KL(⇡|⇠) + '1(⇡) + '2(⇡) Proxg( ⇡ ) def. = argmin˜ ⇡ KL(˜ ⇡|⇡ ) + g (˜ ⇡ ) ⇡(0) def. = y ⇡(`) def. = Prox'`%2 ( ⇡(` 1) z(` 2) ) z(`) def. = z(` 2) ⇡(` 1) ⇡(`)

operator: Initialization: Iterations: (?) min ⇡ KL(⇡|⇠) + '1(⇡) + '2(⇡) Proxg( ⇡ ) def. = argmin˜ ⇡ KL(˜ ⇡|⇡ ) + g (˜ ⇡ ) ⇡(0) def. = y ⇡(`) def. = Prox'`%2 ( ⇡(` 1) z(` 2) ) z(`) def. = z(` 2) ⇡(` 1) ⇡(`) Theorem: ⇡(`) ! ⇡ solution of ( ? ).

operator: Initialization: Iterations: (?) Proof: Dykstra is block-coordinate minimization on the dual. min u1,u2 E⇤(rE(y) u1 u2) + '⇤ 1 (u1) + '⇤ 2 (u2) min ⇡ KL(⇡|⇠) + '1(⇡) + '2(⇡) Proxg( ⇡ ) def. = argmin˜ ⇡ KL(˜ ⇡|⇡ ) + g (˜ ⇡ ) ⇡(0) def. = y ⇡(`) def. = Prox'`%2 ( ⇡(` 1) z(` 2) ) z(`) def. = z(` 2) ⇡(` 1) ⇡(`) Theorem: ⇡(`) ! ⇡ solution of ( ? ).

Proximal Maps For Entropic Wasserstein Flows Proposition: min ⇡ KL(⇡|⇠)
+ '1(⇡) + '2(⇡) '1(⇡) = ◆Cq (⇡) Cq def. = ⇡ ; ⇡T 1 = q Prox'1 ( ⇡ ) = ⇡ diag ⇣ q ⇡T 1 ⌘

Proposition: Proximal Maps For Entropic Wasserstein Flows Proposition: min ⇡
KL(⇡|⇠) + '1(⇡) + '2(⇡) '2(⇡) def. = ⌧ f(⇡1) '1(⇡) = ◆Cq (⇡) Cq def. = ⇡ ; ⇡T 1 = q Prox'1 ( ⇡ ) = ⇡ diag ⇣ q ⇡T 1 ⌘ Prox'2 ( ⇡ ) = diag Prox⌧ h( ⇡1 ) ⇡1 ! ⇡

Dykstra For Entropic Wasserstein Flows Dykstra’s iterates: Proposition: ⇡(`) =
diag(a(`))⇠ diag(b(`)) µ(`) = u(`)v(`),T One has: (⇡(`), z(`))

diag(a(`))⇠ diag(b(`)) µ(`) = u(`)v(`),T One has: (⇡(`), z(`)) u(`) = u(` 2) a(` 1) a(`) v(`) = v(` 2) b(` 1) b(`) a(`) = a(` 1) u(` 2) b(`) = q ⇠T (a(`)) b(`) = b(` 1) v(` 2) a(`) = p(`) ⇠(b(`)) p(`) def. = Prox KL ⌧ f ( a(` 1) u(` 2) ⇠ ( b(`) )) a(0) = b(0) = u(0) = v(0) = 1 Odd `: Even `:

diag(a(`))⇠ diag(b(`)) µ(`) = u(`)v(`),T One has: ! Only matrix/vector multplications ⇠(a), ⇠T (a). (⇡(`), z(`)) u(`) = u(` 2) a(` 1) a(`) v(`) = v(` 2) b(` 1) b(`) a(`) = a(` 1) u(` 2) b(`) = q ⇠T (a(`)) b(`) = b(` 1) v(` 2) a(`) = p(`) ⇠(b(`)) p(`) def. = Prox KL ⌧ f ( a(` 1) u(` 2) ⇠ ( b(`) )) a(0) = b(0) = u(0) = v(0) = 1 Odd `: Even `:

Proposition: Example: Crowd Motion Congestion-inducing function: [Maury, Roudne↵-Chupin, Santambrogio 2010]
f(p) = ◆[0,]N (p) + hw, pi Prox f ( p ) = min( e w/ p,  )

Proposition: Example: Crowd Motion Congestion-inducing function:  = ||pt=0 ||1
 = 2||pt=0 ||1  = 4||pt=0 ||1 [Maury, Roudne↵-Chupin, Santambrogio 2010] Potential cos( w ) f(p) = ◆[0,]N (p) + hw, pi Prox f ( p ) = min( e w/ p,  )

Non-Linear Diffusions 0 0.5 1 1.5 2 -0.8 -0.6 -0.4
-0.2 0 0.2 0.4 m=1 m=2 m=5 m=10 em( s ) def. = ⇢ s (log( s ) 1) if m = 1 , ssm 1 m m 1 if m > 1 . Generalized entropies: Functions em f(p) def. = P i biemi (pi)

Non-Linear Diffusions 0 0.5 1 1.5 2 -0.8 -0.6 -0.4
-0.2 0 0.2 0.4 m=1 m=2 m=5 m=10 0 0.5 1 1.5 2 0 0.5 1 1.5 m=1 m=2 m=5 m=10 em( s ) def. = ⇢ s (log( s ) 1) if m = 1 , ssm 1 m m 1 if m > 1 . Generalized entropies: Functions em Proxem f(p) def. = P i biemi (pi)

Non-Linear Diffusions Varying m Varying b 0 0.5 1 1.5
2 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 m=1 m=2 m=5 m=10 0 0.5 1 1.5 2 0 0.5 1 1.5 m=1 m=2 m=5 m=10 em( s ) def. = ⇢ s (log( s ) 1) if m = 1 , ssm 1 m m 1 if m > 1 . Generalized entropies: Functions em Proxem f(p) def. = P i biemi (pi)

Optimal Transport on Surfaces Triangulated mesh: M. Geodesic distance: dM.

Optimal Transport on Surfaces Ground cost: ci,j = dM(xi, xj)
2 . Triangulated mesh: M. Geodesic distance: dM. Level sets xi d ( xi, ·)

Optimal Transport on Surfaces Ground cost: ci,j = dM(xi, xj)
2 . Triangulated mesh: M. Geodesic distance: dM. Level sets xi d ( xi, ·) Computing c (Fast-Marching): N2 log( N ) ! too costly.

Entropic Transport on Surfaces Heat equation on M: @ u
( x, ·) = Mu ( x, ·) , u0( x, ·) = x [Solomon et al 2015]

Entropic Transport on Surfaces Heat equation on M: Sinkhorn kernel:
Theorem: [Varadhan] log( u ) !0 ! d2 M @ u ( x, ·) = Mu ( x, ·) , u0( x, ·) = x ⇠ = e d2 M ⇡ Id L 1 M L Caveat: proved if M di↵eomorphic to a disk . . . [Solomon et al 2015]

Crowd Motion with Obstacles M = sub-domain of R2 .
 ||pt=0 ||1 = 1  ||pt=0 ||1 = 2  ||pt=0 ||1 = 4  ||pt=0 ||1 = 6 Potential cos( w )

Anisotropic Diffusion

Crowd Motion on a Surface  ||pt=0 ||1 = 1
 ||pt=0 ||1 = 6 M = triangulated mesh. Potential cos( w )

Non-convex Functionals h(p) = ◆[0,]N (p) Congestion-inducing function: h(p) =
◆{0,}N (p) convex non-convex Proxh Proxh  /e convex non-convex

Conclusion JKO discrete ﬂows: ! Advection, di↵usion, non-smooth nonlinearities.

Conclusion JKO discrete ﬂows: Entropic regularization: ! Trade Wasserstein vs.
KL divergence. ! Advection, di↵usion, non-smooth nonlinearities.

KL divergence. ! Advection, di↵usion, non-smooth nonlinearities. Heat kernel approximation: ! Seamless computations on manifolds.

KL divergence. ! Advection, di↵usion, non-smooth nonlinearities. Heat kernel approximation: ! Seamless computations on manifolds. Open problem: ! W is not a metric no limitting ﬂow as ⌧ ! 0.

KL divergence. ! Advection, di↵usion, non-smooth nonlinearities. Heat kernel approximation: ! Seamless computations on manifolds. Open problem: ! W is not a metric no limitting ﬂow as ⌧ ! 0. ! Requires ⇠ ⌧2 ! 0.

Discrete Entropic Wasserstein Flows

Discrete Entropic Wasserstein Flows

More Decks by Gabriel Peyré

Other Decks in Research

Featured

Transcript