Guillaume Carlier (Université Paris Dauphine - PSL, France) Displacement Smoothness of Entropic Optimal Transport and Applications to some Evolution Equations and Systems

1 Displacement smoothness of entropic optimal transport and applications to
some evolution equations Guillaume Carlier a Based on joint works with Lénaïc Chizat and Maxime Laborde (2022) and Hugo Malamut (in progress). Workshop on Optimal Transport from Theory to Applications, Interfacing Dynamical Systems, Optimization, and Machine Learning, Berlin, march 2024. aCEREMADE, Université Paris Dauphine and MOKAPLAN (Inria- Dauphine). /1

Introduction 2 Introduction Given c ∈ C(Rd × Rd), X1
and X2 convex compact subsets of Rd and µ1 , µ2 compactly supported probability measures on X1 and X2 respectively, the optimal transport problem of Monge and Kantorovich reads inf γ∈Π(µ1,µ2 ) X1×X2 c(x1 , x2 ) γ(dx1 , dx2 ) (1) where Π(µ1 , µ2 ) is the set of probability measures on X := X1 × X2 having µ1 and µ2 as marginals. /2

Introduction 3 Its entropic regularization inf γ∈Π(µ1,µ2) X1×X2 c(x1 ,
x2 ) γ(dx1 , dx2 ) + εH(γ|µ1 ⊗ µ2 ) where H stands for relative entropy, is known to be much more tractable (uniqueness, regularity, eﬃcient computation by Sinkhorn algorithm...) and precise convergence results to the initial OT problem are by now quite well understood. Success of Sinkhorn’s algorithm (Cuturi, Peyré), connection with large deviations, stochastic control, Schrödinger bridges (Léonard, Mikami, Dawson and Gartner...). /3

Introduction 4 It is well-known the unique optimal entropic plan
is of the form γε (dx1 , dx2 ) = eφ1(x1)+φ2(x2)−c(x1,x2) ε µ1 (dx1 ) ⊗ µ2 (dx2 ) where the (so-called Schrödinger) potentials φ1 and φ2 are implicitly determined by the marginal constraints γε ∈ Π(µ1 , µ2 ). /4

Introduction 5 The marginal constraints lead to the following system
(Schrödinger system) for the potentials 1 = X2 eφ1(x1)+φ2(x2)−c(x1,x2) ε µ2 (dx2 ) for µ1 -a.e. x1 = X1 eφ1(x1)+φ2(x2)−c(x1,x2) ε µ1 (dx1 ) for µ2 -a.e. x2 Note that this system is invariant by (φ1 , φ2 ) → (φ1 + λ, φ2 − λ). /5

Introduction 6 Sinkhorn (aka Iterated proportional ﬁtting procedure or... Gauss
Seidel) iterations φt+1 1 (x1 ) = −ε log X2 eφt 2(x2)−c(x1,x2) ε µ2 (dx2 ) and φt+1 2 (x2 ) = −ε log X1 eφ t+1 1 (x1)−c(x1,x2) ε µ1 (dx1 ) . /6

Introduction 7 Taking ε = 1, to simplify this rewrites
as T(φ, µ) = 0 where φ = (φ1 , φ2 ), µ = (µ1 , µ2 ) and T = (T1 , T2 ) whose components are given by x1 → T1 (φ, µ)(x1 ) := φ1 (x1 ) + log X2 eφ2(x2)−c(x1,x2)µ2 (dx2 ) and x2 → T2 (φ, µ)(x2 ) := φ2 (x2 )+log X1 eφ1(x1)−c(x1,x2)µ1 (dx1 ) . /7

Introduction 8 Assume c ∈ Ck(X1 × X2 ) (i.e.
c has a Ck extension over Rd × Rd), quotient Ck(X1 ) × Ck(X2 ) by the equivalence relation φ ∼ ψ if there is a constant λ such that φ1 = ψ1 + λ and φ2 = ψ2 − λ and then view T(., µ) as a self map of (the Banach space) Ck = Ck(X1 ) × Ck(X2 )/ ∼. /8

Introduction 9 Given µ the existence and uniqueness of φ
∈ Ck ﬁtting the marginal constraints i.e. such that T(φ, µ) = 0 was established (by variational or ﬁxed point arguments, using the Hilbert metric) by Borwein, Lewis and Nussbaum (1994). More elementary proof and extension to the multi-marginal case by using Sinkhorn algorithm by Gerolin and Di Marino (2019). Local/global inversion arguments by C. and Laborde (2020), smooth dependence for L∞ perturbation of the marginals. Quantitative convergence of Sinkhorn, still an active area (bounded costs, multi-marginals C., unbounded costs: Léger, Nutz, Eckstein...). /9

Introduction 10 Denote then by S(µ) = (φ1 , φ2
) ∈ Ck this solution: the Schrödinger map which maps marginals to the Schrödinger potentials (with a suitable normalization or equivalently by quotienting by ∼). /10

Outline 11 Outline ➀ Displacement smoothness of EOT ➁ Gradient
ﬂows ➂ Entropic semi-geostrophic equations ➃ Remarks on continuity equations ➄ Convergence as ε → 0 /11

Displacement smoothness of EOT 12 Displacement smoothness of EOT Equip
P(X1 ) × P(X2 ) with the 2-Wasserstein product distance: W2 2 (µ, ν) := W2 2 (µ1 , ν1 ) + W2 2 (µ2 , ν2 ) where W2 2 (µ, ν) := inf γ∈Π(µ,ν) |x − y|2dγ(x, y). Let γi ∈ Π(µi , νi ) (not necessarily optimal) for i = 1, 2, and for t ∈ [0, 1] deﬁne the displacement interpolation µt := (µt 1 , µt 2 ) between (µ1 , µ2 ) and (ν1 , ν2 ) by Xi fdµt i := Xi ×Xi f((1 − t)xi + tyi )dγi (xi , yi ). for every f ∈ C(Xi ). Displacement smoothness of EOT/1

Displacement smoothness of EOT 13 Letting φt := S(µt) be
the Schrödinger potentials between µt 1 and µt 2 , we then have Theorem 1 (C., Chizat, Laborde, 2022) For p, k ∈ N∗, p ≤ k, if c ∈ Ck+p(X1 × X2 ) then the parametrized Schrödinger map t → φt = S(µt) belongs to Cp([0, 1]; Ck). Moreover, there exists C > 0 that only depends on c Ck+1 such that φt − φs Ck ≤ C|t − s| cost(γ1 , γ2 ) where cost(γ1 , γ2 ) := 2 i=1 |xi − yi |2dγi (xi , yi ) is the L2 transport cost associated with the plans γ1 , γ2 . Displacement smoothness of EOT/2

Displacement smoothness of EOT 14 Note the obvious corollary that
if c ∈ Ck+1 then for some C > 0 one has S(µ) − S(ν) Ck ≤ CW2 (µ, ν). We actually prove the previous theorem for the more general multi-marginal case (mainly at the expense of cumbersome notations). Sketch of proof of Thm 1, G(t, φ) := T(φ, µt): essentially the Implicit Function Theorem for G. First step: invertibility of ∂φ G. Displacement smoothness of EOT/3

Displacement smoothness of EOT 15 Invertibility of ∂φ G derivative
with respect to φ, start with ψ1 , ψ2 in its null space i.e. solving the linearized system 0 = ψ1 (x1 ) + X2 ψ2 (x2 )e−c(x1,x2)+φ2(x2)dµt 2 (x2 ) X2 e−c(x1,x2)+φ2(x2)dµt 2 (x2 ) 0 = ψ2 (x2 ) + X1 ψ1 (x1 )e−c(x1,x2)+φ1(x1)dµt 1 (x1 ) X1 e−c(x1,x2 )+φ1 (x1 )dµt 1 (x1 ) convenient to rewrite it in terms of conditional expectations wrt the probability Q := αe−c(x1,x2)+φ2(x2)+φ1(x1)µt 1 ⊗ µt 2 . Displacement smoothness of EOT/4

Displacement smoothness of EOT 16 The linearized system becomes ψ1
(x1 ) + X2 ψ2 (x2 )dQ2 (x2 |x1 ) = 0 and ψ2 (x2 ) + X1 ψ1 (x1 )dQ1 (x1 |x2 ) = 0 multiplying the ﬁrst equation by ψ1 (x1 ) and integrating wrt Q1 we get X1 ψ2 1 (x1 )dQ1 (x1 ) + X1×X2 ψ1 (x1 )ψ2 (x2 )dQ(x1 , x2 ) = 0 and in a similar fashion X2 ψ2 2 (x2 )dQ2 (x2 ) + X1×X2 ψ1 (x1 )ψ2 (x2 )dQ(x1 , x2 ) = 0 Displacement smoothness of EOT/5

Displacement smoothness of EOT 17 So X1×X2 (ψ1 (x1 )
+ ψ2 (x2 ))2dQ(x1 , x2 ) = 0 which in the end implies (ψ1 , ψ2 ) ∼ 0. This shows ∂φ G is one to one, by the Fredholm alternative one obtains invertibility, one has to work a bit to bound the operator norm (in Ck) of its inverse at φt, and to bound derivatives of G with respect to t in terms of cost(γ). Displacement smoothness of EOT/6

Application to gradient flows 18 Application to gradient flows Given
V a functional of probability measures, since the seminal work of Jordan-Kinderlehrer and Otto, the evolution equation ∂t ρ = div ρ∇ δV δρ (ρ) , ρ(0, .) = ρ0 (with no flux boundary conditions) can be seen as the gradient flow of V for the 2-Wasserstein metric. Ambrosio-Gigli-Savaré’s green book, key to well posedness is the displacement semi-convexity of V : V (µt ) ≤ (1 − t)V (µ) + tV (ν) + λ 2 t(1 − t)W2 2 (µ, ν) where µt , t ∈ [0, 1] is the displacement interpolation (via an optimal plan) between µ and ν. Application to gradient flows/1

Application to gradient ﬂows 19 EOT cost E(µ) = E(µ1
, µ2 ) := inf γ∈Π(µ1,µ2) cdγ + H(γ|µ1 ⊗ µ2 ) if c is C2, and µt is a (not necessarily optimal) displacement interpolation between µ0 and µ1 using the plans γ = (γ1 , γ2 ), we deduce from Theorem 1 that E is "displacement C1,1" i.e. semi-convex and semi-concave: (1 − t)E(µ0) + tE(µ1) − Ccost(γ)t(1 − t) 2 ≤ E(µt) ≤ (1 − t)E(µ0) + tE(µ1) + Ccost(γ)t(1 − t) 2 for some C > 0 depending on c C2 , and the Schrödinger map S(µ) is the gradient of E. Application to gradient ﬂows/2

Application to gradient flows 20 Examples: • well-posedness of the
gradient flow of ρ → E(ρ, µ) (with fixed µ): ∂t ρ = div(ρ∇S1 (ρ, µ)). • (possibly non linear) diffusive version m > 1, α ≥ 0 ∂t ρ = div(ρ∇S1 (ρ, µ)) + α∆ρm. • Sinkhorn divergence ρ → E(ρ, µ) − 1 2 E(ρ, ρ) − 1 2 E(µ, µ) (µ fixed) ∂t ρ = div ρ(∇S1 (ρ, µ)− 1 2 (∇S1 (ρ, ρ)+∇S2 (ρ, ρ)))+α∆ρm. Application to gradient flows/3

Application to gradient ﬂows 21 Works for systems such as
∂t ρi = div(ρ∇Si (ρ1 , · · · , ρN )) + αi ∆ρmi i . Particular case ∂t ρi = div(ρ∇Si (ρ1 , · · · , ρN )) + ∆ρi . Exponential (log Sobolev) convergence to the marginals of γ∗ := e−c, γt optimal entropic plan between the marginals ρt i , H(γt|γ∗) ≤ Ce−κt so by Talagrand’s transport inequality W2 2 (γt, γ∗) ≤ Ce−κt. Application to gradient ﬂows/4

Entropic semi-geostrophic equations 22 Entropic semi-geostrophic equations Work in progress
with H. Malamut. The semigeostrophic equations are a simple model used in meteorology to describe large scale atmospheric ﬂows. Goes back to the work of Eliassen in 1948, Hoskins 1975, revided since the 1980’s with the works of Mike Cullen. Lots of interest in the last 25 years due to connections with OT and Monge-Ampère equations, Benamou and Brenier 1998, Gangbo and Cullen 2001, Loeper 2005, Ambrosio, Colombo, De Philippis and Figalli 2014. Entropic semi-geostrophic equations/1

Entropic semi-geostrophic equations 23 Given Ω a Lipschitz bounded open
subset of R3, and α0 a Borel measure on R3 with total mass |Ω| the semi-geostrophic system reads as the coupling of ∂t α + div(αJ(id −∇ψ)) = 0, α(0, .) = α0 , J :=     0 −1 0 1 0 0 0 0 0     (2) with the Monge-Ampère equation det(D2ψt ) = αt , ψt convex. (3) Which has to be understood in some suitable weak-sense using optimal transport: by Brenier’s theorem, ∇ψt is the quadratic OT map between αt and the uniform measure µ0 on Ω. Existence shown by Benamou and Brenier. Entropic semi-geostrophic equations/2

Entropic semi-geostrophic equations 24 Consider the slightly more general equation
∂t α+div(αA(id −∇ψ)) = 0, α(0, .) = α0 , det(D2ψt )µ0 (∇ψt ) = αt , (4) with ψ convex. For d = 3, µ0 the uniform probability measure on Ω and A = J, we recover the initial problem (2)-(3) after normalizing all measures by dividing them by |Ω|. Idea: view ∇u = id −∇ψ as the conditional expectation of x − y given x for an optimal transport plan γ. Entropic semi-geostrophic equations/3

Entropic semi-geostrophic equations 25 Weak solution t ∈ [0, T]
→ αt such that for every f ∈ C1 c ([0, T] × Rd), one has T 0 Rd [∂t f + Ax · ∇f]αt (dx)dt − T 0 Rd×Rd Ay · ∇f(t, x)γt (dx, dy)dt = Rd f(T, x)αT (dx) − Rd f(0, x)α0 (dx), (5) where γt is an optimal plan between αt and µ0 i.e. γt ∈ Π(αt , µ0 ) and W2 2 (αt , µ0 ) = Rd×Rd |x − y|2γt (dx, dy), for a.e. t ∈ [0, T]. (6) Entropic semi-geostrophic equations/4

Entropic semi-geostrophic equations 26 Entropic semi-geostrophic equations Given ε >
0, idea is to replace optimal plans γ for W2 by optimal entropic plan γε, α and µ compactly supported, consider OTε (α, µ) := inf γ∈Π(α,µ) 1 2 Rd×Rd |x−y|2γ(dx, dy)+εH(γ|α⊗µ) (7) The unique optimal plan γε for OTε (α, µ) has the Gibbs form γε(dx, dy) = exp − |x − y|2 2ε + uε(y) + vε(x) ε α(dx)µ(dy) (8) where the potentials uε and vε are such that γε ∈ Π(α, µ) i.e. satisfy the Schrödinger system. Entropic semi-geostrophic equations/5

Entropic semi-geostrophic equations 28 Entropic regularization SGε with parameter ε
> 0 of (4) ∂t αε + div(αεA(∇vε)) = 0 in [0, T] × Rd, αε(0, .) = α0 , (12) where ∇vε t (x) = x − Rd yγε t (dy|x) (13) and γε t is the solution of OTε(αε t , µ0 ) i.e. γε t ∈ Π(αε t , µ0 ) and OTε (αε t , µ0 ) = 1 2 Rd×Rd |x − y|2γε t (dx, dy) + εH(γε t |αε t ⊗ µ0 ). (14) It follows from Theorem 1 that (the smooth map) ∇vε depends in a W2 -Lipschitz way on αε. Entropic semi-geostrophic equations/7

Remarks on continuity equations 29 Remarks on continuity equations Consider
a map B : α ∈ Pc (Rd) → B[α] ∈ C(Rd, Rd), the ideal situation to solve/approximate ∂t α + div(αB[α]) = 0, α(0, .) = α0 (15) ... Remarks on continuity equations/1

Remarks on continuity equations 30 .. is when B satisﬁes
the following properties: • (H1) There exists C > 0 such that |B[α](x)| ≤ C(1 + |x|), ∀(x, α) ∈ Rd × Pc (Rd), (16) • (H2) For every R > 0 KR := sup{Lip(B[α], BR ), α ∈ P(BR )} < +∞, (17) • (H3) For every R > 0 MR := sup spt(αi )⊂BR ,α1=α2 B[α1] − B[α2] L∞(BR ) W2 (α1, α2) < +∞, (18) Remarks on continuity equations/2

Remarks on continuity equations 31 Under these assumptions, given α0
∈ Pc (Rd) solving/approximating (15) is quite straightforward, standard Cauchy Lipschitz framework. Indeed, rewrite (15) as the fixed-point problem α = Φα0 (α) with Φα0 (α)t := Xα t # α0 , t ∈ [0, T] where Xα t is the (globally well-defined) flow of B[α]: d dt Xα t (x) = B[αt ](Xα t (x)), Xα 0 (x) = x, (t, x) ∈ [0, T]×Rd. (19) for well chosen λ > 0, Φα0 is a contraction for the distance dist(α1, α2) = supt∈[0,T ] e−λtW2 (α1 t , α2 t ). Existence, uniqueness, Lipschitz dependence wrt initial condition. Remarks on continuity equations/3

Remarks on continuity equations 32 Can we apply this to
the drift Bε, i.e. does it satisfy (H1)-(H2)-(H3)? The linear growth condition (H1) is obvious with a constant independent of ε, follows from (10) and the fact that spt γε(.|x) lies in BR0 . The Lipschitz (in x) requirement (H2) follows from (11) with a constant K ∼ R2 0 ε−1. We deduce from the displacement smoothness result that Bε satisﬁes (H3) (but with a very bad constant M ∼ e−Aε−1 ). Remarks on continuity equations/4

Remarks on continuity equations 33 Back to SGε : ∂t
αε + div(αεBε[αε]) = 0, αε(0, .) = α0 with Bε[α] = J(∇vε), vε Schrödinger potential between α and µ0 . by our general considerations on nice continuity equations, we deduce Theorem 2 For ε > 0, (12)-(13)-(14) admits a unique solution αε. Note that we have not used the Hamiltonian structure of SGε (SGε enters the Hamiltonian framework of Ambrosio and Gangbo). Conservation of energy is easy to see OTε (αε, µ0 ) is constant in time (and the vertical marginal of αε is of course constant as well). Remarks on continuity equations/5

Convergence as ε → 0 34 Convergence as ε →
0 Not diﬃcult to show that cluster points of solutions of SGε solve SG but of little practical use. Time and space discretization by an explicit Euler scheme proposed by Benamou, Cotter and Malamut. Consider a time step τε > 0 with T = Nε τε and a quantized approximation of the initial α0 ∈ P(BR0 ) αε 0 := 1 Mε Mε i=1 δxε i , xε i ∈ BR0 and assume that τε + W2 (α0 , αε 0 ) → 0, as ε → 0+. (20) Convergence as ε → 0 /1

Convergence as ε → 0 35 Piecewise constant curve of
measures t ∈ [0, T] → αε t by the explicit Euler scheme i.e.: αε t = αε k , t ∈ [kτε , (k + 1)τε ), k = 0, . . . , Nε − 1 with αε 0 = αε 0 , αε k+1 = (id +τε Bε[αε k ])# αε k , k = 0, . . . , Nε − 1 with Bε deﬁned through OTε (α, µ0 ) as before. One can also quantize µ0 , computation of Bε[αε k ] by Sinkhorn. Convergence as ε → 0 /2

Convergence as ε → 0 36 Observing that W2 (αε
t , αε s ) ≤ κ(|t − s| + τ) for every t, s in [0, T] and a constant κ independent of ε, passing along a suitable vanishing sequence εn → 0, we may assume that for some α = (αt )t∈[0,T ] ∈ C([0, T], (P(BRT ), W2 )) (with RT := 2R0 eT ) one has sup t∈[0,T ) W2 (αε t , αt ) → 0 as ε → 0+, (21) Cluster points of the previous approximations are weak solutions of the initial semi-geostrophic equations: Theorem 3 If α is obtained as a cluster point of the discretized entropic regularization (αε t )t∈[0,T ) i.e. (21) holds, then α is a weak solution of (2)-(3). Convergence as ε → 0 /3

Convergence as ε → 0 37 Let f ∈ C1([0,
T] × R3), observe that T 0 R3 ∂t fαε t = Nε −1 k=0 R3 (k+1)τε kτε ∂t fαε k = Nε −1 k=0 Rd (f((k + 1)τε , .) − f(kτε , .))αε k = Nε −1 k=1 R3 f(kτε , .)(αε k−1 − αε k ) + R3 f(T, .)αε N−1 − R3 f(0, .)αε 0 (22) Convergence as ε → 0 /4

Convergence as ε → 0 38 Setting Bε k =
B[αε k ] and denoting by γε k−1 the solution of OTε (αε k−1 , µ0 ) and using the fact that αε k = (id +τε Bε k )# αε k−1 , enables to rewrite R3 f(kτε , .)(αε k−1 − αε k ) = R3 (f(kτε , .) − f(kτε , id +τε Bε k−1 )αε k−1 = −τε R3 ∇f(kτε , x) · Bε k−1 (x)αε k−1 (dx) + o(τε ) = kτε (k−1)τε R6 ∇f(t, x) · J(y − x)γε k−1 (dx, dy)dt + o(τε ). Convergence as ε → 0 /5

Convergence as ε → 0 39 Considering the piecewise constant
curve of plans t → γε t deﬁned by γε t = γε k for t ∈ [kτε , (k + 1)τε ), recalling that Nε τε = 0, we thus have T 0 R3 ∂t fαε t = T 0 R6 ∇f(t, x) · J(y − x)γε t (dx, dy)dt + R3 f(T, x)αT (dx) − Rd f(0, x)α0 (dx) + o(1). (23) assume (possibly after an extraction) that γε t (dx, dy) ⊗ dt weakly ∗ converge as ε → 0+ to some measure of the form γt (dx, dy) ⊗ dt. Convergence as ε → 0 /6

Convergence as ε → 0 40 Obviously T 0 R3
∂t fα = T 0 R6 ∇f(t, x) · J(y − x)γt (dx, dy)dt + R3 f(T, x)αT (dx) − Rd f(0, x)α0 (dx). and γt ∈ Π(αt , µ0 ) for a.e. t. So to show that α is a weak solution of SG it remains to check γt is an optimal plan. Convergence as ε → 0 /7

Convergence as ε → 0 41 It is known that
OTε (αε t , µ0 ) ≤ 1 2 W2 2 (αε t , µ0 ) + κ′ε| log(ε))| for some constant κ′ that does not depend on ε > 0 and t ∈ [0, T]. We thus have T 0 R3×R3 1 2 |x − y|2γt (dx, dy)dt = lim ε T 0 R3×R3 1 2 |x − y|2γε t (dx, dy)dt ≤ lim sup ε T 0 OTε (αε t , µ0 )dt ≤ lim sup ε T 0 1 2 W2 2 (αε t , µ0 )dt = T 0 1 2 W2 2 (αt , µ0 )dt which shows optimality of γt for a.e. t. Convergence as ε → 0 /8

Guillaume Carlier (Université Paris Dauphine - ...

Guillaume Carlier (Université Paris Dauphine - PSL, France) Displacement Smoothness of Entropic Optimal Transport and Applications to some Evolution Equations and Systems

More Decks by Jia-Jie Zhu

Featured

Transcript