Entropy dissipation via information Gamma calculus

Slide 1

Slide 1 text

Entropy dissipation via information Gamma calculus Wuchen Li University of South Carolina Analysis Seminar, CMUC, April 9th. This is based on a joint work with Qi Feng (USC). 1

Slide 2

Slide 2 text

Dynamics and Lyapunov functionals 2

Slide 3

Slide 3 text

Stochastic differential equations Consider a stochastic differential equation (SDE) by ˙ Xt = b(Xt ) + √ 2a(Xt ) ˙ Bt , where (n, m) ∈ N, Xt ∈ Rn+m, b ∈ Rn+m is a drift vector function, a(Xt ) ∈ R(n+m)×n is a diffusion matrix function, and Bt ∈ Rn is a standard Brownian motion. The above SDE has been widely used in practice. Mathematical physics equations; Protein folding. Designing Markov-Chain-Monte-Caro algorithms; 3

Slide 4

Slide 4 text

Example: Langevin dynamics We review a classical example. We start with a gradient drift-diﬀusion process by ˙ Xt = −∇V (Xt ) + √ 2 ˙ Bt , where V is a given potential function. Let ρ(t, x) be the probability density function of Xt , which satisﬁes the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)∇V (x)) + ∆ρ(t, x). Here π(x) := 1 Z e−V (x), where Z = e−V dx < ∞, is the invariant distribution of the SDE. Here the main question is that How fast does ρ(t, x) converge to the invariant distribution π? 4

Slide 5

Slide 5 text

Lyapunov methods To study the dynamical behavior of ρ, we apply a Lyapunov functional by DKL (ρt π) = ρt (x)log ρt (x) π(x) dx. Along the Fokker-Planck equation, the first order dissipation satisfies d dt DKL (ρt π) = − ∇x log ρt (x) π(x) 2ρt dx, And the second order dissipation satisfies d2 dt2 DKL (ρt π) = 2 ∇2 xx log ρt π 2 F −∇2 xx log π(∇x log ρt π , ∇x log ρt π ) ρt dx, where · F is a matrix Frobenius norm. In literature, DKL is named the Kullback–Leibler divergence (relative entropy) and I = − d dt DKL is called the relative Fisher information functional. 5

Slide 6

Slide 6 text

Lyapunov constant Suppose there exists a “Lyapunov constant” λ > 0, such that −∇2 xx log π(x) λI. Then d2 dt2 DKL (ρt π) ≥ −2λ d dt DKL (ρt π). By integrating on the time variable, one can prove the exponential convergence by DKL (ρt π) ≤ e−2λtDKL (ρ0 π). As a by-product, one can show the log-Sobolev inequality by DKL (ρ π) ≤ 1 2λ I(ρ π). 6

Slide 7

Slide 7 text

Literature There are several equivalent approaches to establish the Lyapunov constant for gradient dynamics. Log-Sobolev inequality (Gross); Iterative Gamma calculus (Bakry, Emery, Baudoin, Garofalo et.al.); Entropy dissipation (Arnold, Carlen, Carrilo, Mohout, Jungel, Markowich, Toscani et.al.); Optimal transport, displacement convexity and Hessian operators in density space. (Mccann, Ambrosio, Villani, Otto, Gangbo et.al.); Transport Lyapunov functional (Renesse, Strum et.al.). 7

Slide 8

Slide 8 text

Problem Recall that ˙ Xt = b(Xt ) + √ 2a(Xt ) ˙ Bt , where b can be a non-gradient drift vector, and a is a degenerate matrix. And its Fokker-Planck equation (Hypoelliptic) satisﬁes ∂t ρ = −∇ · (ρb) + n+m i=1 n+m j=1 ∂2 ∂xi ∂xj (a(x)a(x)T)ij ρ . Assume that there exists an invariant distribution π with a given explicit formulation. The major problem is given below. How fast does ρ converge to the invariant distribution π? 8

Slide 9

Slide 9 text

Goals In this talk, we mainly consider the entropy dissipation for perturbed-gradient dynamical systems. Main diﬃculties. Degeneracy of diﬀusion matrix; Non-gradient drift vectors. Our method is based on the extended second order calculus in generalized optimal transport space. 9

Slide 10

Slide 10 text

Review: Optimal transport space The optimal transport has a variational formulation (Benamou-Brenier 2000): D(ρ0, ρ1)2 := inf v 1 0 EXt∼ρt v(t, Xt ) 2 dt, where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt ), X0 ∼ ρ0, X1 ∼ ρ1. Under this metric, the probability set has a metric structure1. 1John D. Lafferty: the density manifold and configuration space quantization, 1988. 10

Slide 11

Slide 11 text

Review: Optimal transport metric Informally speaking, the optimal transport metric refers to the following bilinear form: ˙ ρ1 , G(ρ) ˙ ρ2 = ( ˙ ρ1 , (−∆ρ )−1 ˙ ρ2 )dx. In other words, denote ˙ ρi = −∇ · (ρ∇φi ), i = 1, 2, then φ1 , G(ρ)−1φ2 = (φ1 , −∇ · (ρ∇)φ2 )dx = (∇φ1 , ∇φ2 )ρdx, where ρ ∈ P(Ω), ˙ ρi is the tangent vector in P(Ω), i.e. ˙ ρi dx = 0, and φi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point ρ. 11

Slide 12

Slide 12 text

Review: Optimal transport gradient flows The Wasserstein gradient flow of an energy functional F(ρ) leads to ∂t ρ = − G(ρ)−1 δ δρ F(ρ) =∇ · (ρ∇ δ δρ F(ρ)). Example If F(ρ) = F(x)ρ(x)dx, then the gradient flow satisfies ∂t ρ = ∇ · (ρ∇F(x)). 12

Slide 13

Slide 13 text

Entropy dissipation revisited The gradient ﬂow of the KL divergence DKL (ρ π) = ρ(x)log ρ(x) π(x) dx, w.r.t. optimal transport metric distance satisﬁes the Fokker-Planck equation ∂ρ ∂t = ∇ · (ρ∇log ρ π ). Here the major trick is that ρ∇ log ρ = ∇ρ. 13

Slide 14

Slide 14 text

Entropy dissipation revisited In this way, one can study the ﬁrst order entropy dissipation by d dt DKL (ρt π) = log ρt π ∇ · (ρ∇log ρt π )dx = − ∇log ρt π 2ρdx = − I(ρt π). Similarly, we study the second order entropy dissipation by d dt I(ρt π) = −2 Ω Γ2 (log ρt π , log ρt π )ρt dx, where Γ2 is a bilinear form, which can be deﬁned by the optimal transport second order operator. 14

Slide 15

Slide 15 text

Lyapunov methods for degenerate non-gradient flows Consider a perturbed gradient flow by ˙ ρ = −G(ρ)−1DKL (ρt π) + f( ρt π ), where f is a given function generated by non-gradient drift vector field. How can we study the convergence behavior of ρ? 15

Slide 16

Slide 16 text

Motivation: Decomposition Assume π is the invariant measure, which is with the explicit formulation. We decompose the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)a(x)a(x)T∇ log ρ(t, x) π(x) ) + ∇ · (ρ(t, x)γ(x)), Gradient direction Perturbed direction where γ(x) :=a(x)a(x)T∇ log π(x) − b(x) + n+m j=1 ∂ ∂xj (a(x)a(x)T)ij 1≤i≤n+m . and ∇ · (π(x)γ(x)) = 0. 16

Slide 17

Slide 17 text

Main result: Structure condition Assumption: for any i ∈ {1, · · · , n} and k ∈ {1, · · · , m}, we assume zT k ∇aT i ∈ Span{aT 1 , · · · , aT n }. Examples a is a constant vector; a is a matrix function deﬁned by a = a(x1 , · · · , xn ), z ∈ span{en+1 , · · · , en+m }, where ei is the i-th Euclidean basis function. 17

Slide 18

Slide 18 text

Main result: Entropy dissipation [F. and Li, 2021] Under the assumption, for any β ∈ R and a given vector function z, deﬁne matrix functions by R = Ra + Rz + Rπ − MΛ + βRIa + (1 − β)Rγa + Rγz , If there exists a constant λ > 0, such that R λ(aaT + zzT), then the following decay results. DKL (ρt π) ≤ 1 2λ e−2λtIa,z (ρ0 π), where ρt is the solution of Fokker-Planck equation. 18

Slide 19

Slide 19 text

Tensors 19

Slide 20

Slide 20 text

Comparisons (i) If γ = 0 and m = 0: [Bakry-Emery, 1985]. (ii) If γ = 0 and m = 0: [Baudoin-Garofalo, 2017], [F.-Li, 2019]. (iii) If β = 0 and m = 0, [Arnold-Carlen-Ju, 2000, 2008]. (iv) If a, z are constants and β = 0, [Arnold-Erb, 2014][Baudoin-Gordina-Herzog, 2019]. (v) If β = 1, m = 0 and a = I, [Arnold-Carlen]; [F.-Li, 2020]. 20

Slide 21

Slide 21 text

Idea of proof Deﬁne Ia,z (ρ π) = Rn+m ∇ log ρ π , (aaT + zzT)∇ log ρ π ρdx. Consider − 1 2 d dt Ia,z (ρt ) = Γ2 (f, f)ρt dxdx · · · (I) + Γz,π 2 (f, f)ρt dx · · · (II) + ΓIa,z (f, f)ρt dx · · · (III) where f = log ρ π , and Γ2 , Γz 2 , Γγ are designed bilinear forms, coming from the second order calculation in density space. (i) If a is non-degenerate, then (II) = 0; (ii) If b is a gradient vector ﬁeld, then (III) = 0. 21

Slide 22

Slide 22 text

Detailed approach For any f ∈ C∞(Rn+m), the generator of Itˆ o SDE satisﬁes Lf = Lf − γ, ∇f , where Lf = ∇ · (aaT∇f) + aaT∇ log π, ∇f . For a given matrix function a ∈ R(n+m)×n, we construct a matrix function z ∈ R(n+m)×m, and deﬁne a z-direction generator by Lz f = ∇ · (zzT∇f) + zzT∇ log π, ∇f . 22

Slide 23

Slide 23 text

Global in space computation=Gamma operators Deﬁne Gamma one bilinear forms by Γ1 (f, f) = aT∇f, aT∇f Rn , Γz 1 (f, f) = zT∇f, zT∇f Rm . Deﬁne Gamma two bilinear forms by (i) Gamma two operator: Γ2 (f, f) = 1 2 LΓ1 (f, f) − Γ1 (Lf, f). (ii) Generalized Gamma z operator: Γz,π 2 (f, f) = 1 2 LΓz 1 (f, f) − Γz 1 (Lf, f) + divπ z Γ1,∇(aaT) (f, f) − divπ a Γ1,∇(zzT) (f, f) . (iii) Irreversible Gamma operator: ΓIa,z (f, f) = (Lf + Lz f) ∇f, γ − 1 2 ∇ Γ1 (f, f) + Γz 1 (f, f) , γ . 23

Slide 24

Slide 24 text

Local in space calculation= Bochner’s formula For any f = log p π ∈ C∞(Rn+m, R) and any β ∈ R, under the assumption, we derive that − 1 2 d dt Ia,z (ρ π) = Γ2 (f, f) + Γz,π 2 (f, f) + ΓIa,z (f, f) pdx = Hessβ f 2 + R(∇f, ∇f) pdx. Clearly, if R λI, we derive a Lyapunov constant λ for the convergence rate. 24

Slide 25

Slide 25 text

Example Consider a underdamped Langevin dynamic by dxt =vt dt dvt =(−T(xt )vt − ∇x U(xt ))dt + 2T(xt )dBt . (1) It can be viewed as, Yt = (xt , vt ), dYt = b(Yt )dt + √ 2a(Yt )dBt , with matrices b = v −T(x)v − ∇U(x) , a = 0 T(x) . Its invariant measure has a closed form, π(x, v) = 1 Z e−H(x,v), H(x, v) = v 2 2 + U(x). 25

Slide 26

Slide 26 text

Constant diﬀusion -1 -0.5 0 0.5 1 x 1 -1 -0.5 0 0.5 1 x 2 -1 -0.5 0 0.5 1 1.5 Smallest eignvalue: 0.094 0.095 1 1 0.096 0.097 Smallest eignvalue: 0.098 0.099 0.5 0.5 0.1 x 2 x 1 0 0 -0.5 -0.5 -1 -1 Figure: T=1, U(x) = x2/2; Left β = 0 [Arnold-Erb]; Right β = 0.1; z = (1, 0.1)T. 26

Slide 27

Slide 27 text

Variable diﬀusion -0.2 -0.15 1 1 -0.1 -0.05 Smallest eignvalue: 0.9 0.9 0 0.05 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 -0.04 -0.02 1 1 0 0.02 0.04 Smallest eignvalue: 0.06 0.9 0.9 0.08 0.1 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 Figure: U(x) = xc−x c(c−1) , T(x) = (∇2 x U(x))−1, c=2.5, z = 1 0.1 . (Left: β = 0; Right: β = 0.6.) 27

Slide 28

Slide 28 text

Discussion Non-gradient flow functional inequalities; Generalized perturbed-gradient flow dynamics; Mean field Bakry-Emery calculus. 28