100

# Entropy dissipation via information Gamma calculus

In this talk, we present the convergence behavior for some non-gradient degenerate stochastic differential equations towards their invariant distributions. Our method extends the connection between Gamma calculus and Hessian operators in the Wasserstein space. In detail, we apply Lyapunov methods in the space of probabilities, where the Lyapunov functional is chosen as the relative Fisher information. We derive the Fisher information induced Gamma calculus to handle non-gradient drift vector fields and degenerate diffusion matrix. Several examples are provided for non-reversible Langevin dynamics, sub-Riemannian diffusion process, and variable-dependent underdamped Langevin dynamics.

April 06, 2021

## Transcript

1. ### Entropy dissipation via information Gamma calculus Wuchen Li University of

South Carolina Analysis Seminar, CMUC, April 9th. This is based on a joint work with Qi Feng (USC). 1

3. ### Stochastic diﬀerential equations Consider a stochastic diﬀerential equation (SDE) by

˙ Xt = b(Xt ) + √ 2a(Xt ) ˙ Bt , where (n, m) ∈ N, Xt ∈ Rn+m, b ∈ Rn+m is a drift vector function, a(Xt ) ∈ R(n+m)×n is a diﬀusion matrix function, and Bt ∈ Rn is a standard Brownian motion. The above SDE has been widely used in practice. Mathematical physics equations; Protein folding. Designing Markov-Chain-Monte-Caro algorithms; 3
4. ### Example: Langevin dynamics We review a classical example. We start

with a gradient drift-diﬀusion process by ˙ Xt = −∇V (Xt ) + √ 2 ˙ Bt , where V is a given potential function. Let ρ(t, x) be the probability density function of Xt , which satisﬁes the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)∇V (x)) + ∆ρ(t, x). Here π(x) := 1 Z e−V (x), where Z = e−V dx < ∞, is the invariant distribution of the SDE. Here the main question is that How fast does ρ(t, x) converge to the invariant distribution π? 4
5. ### Lyapunov methods To study the dynamical behavior of ρ, we

apply a Lyapunov functional by DKL (ρt π) = ρt (x)log ρt (x) π(x) dx. Along the Fokker-Planck equation, the ﬁrst order dissipation satisﬁes d dt DKL (ρt π) = − ∇x log ρt (x) π(x) 2ρt dx, And the second order dissipation satisﬁes d2 dt2 DKL (ρt π) = 2 ∇2 xx log ρt π 2 F −∇2 xx log π(∇x log ρt π , ∇x log ρt π ) ρt dx, where · F is a matrix Frobenius norm. In literature, DKL is named the Kullback–Leibler divergence (relative entropy) and I = − d dt DKL is called the relative Fisher information functional. 5
6. ### Lyapunov constant Suppose there exists a “Lyapunov constant” λ >

0, such that −∇2 xx log π(x) λI. Then d2 dt2 DKL (ρt π) ≥ −2λ d dt DKL (ρt π). By integrating on the time variable, one can prove the exponential convergence by DKL (ρt π) ≤ e−2λtDKL (ρ0 π). As a by-product, one can show the log-Sobolev inequality by DKL (ρ π) ≤ 1 2λ I(ρ π). 6
7. ### Literature There are several equivalent approaches to establish the Lyapunov

constant for gradient dynamics. Log-Sobolev inequality (Gross); Iterative Gamma calculus (Bakry, Emery, Baudoin, Garofalo et.al.); Entropy dissipation (Arnold, Carlen, Carrilo, Mohout, Jungel, Markowich, Toscani et.al.); Optimal transport, displacement convexity and Hessian operators in density space. (Mccann, Ambrosio, Villani, Otto, Gangbo et.al.); Transport Lyapunov functional (Renesse, Strum et.al.). 7
8. ### Problem Recall that ˙ Xt = b(Xt ) + √

2a(Xt ) ˙ Bt , where b can be a non-gradient drift vector, and a is a degenerate matrix. And its Fokker-Planck equation (Hypoelliptic) satisﬁes ∂t ρ = −∇ · (ρb) + n+m i=1 n+m j=1 ∂2 ∂xi ∂xj (a(x)a(x)T)ij ρ . Assume that there exists an invariant distribution π with a given explicit formulation. The major problem is given below. How fast does ρ converge to the invariant distribution π? 8
9. ### Goals In this talk, we mainly consider the entropy dissipation

for perturbed-gradient dynamical systems. Main diﬃculties. Degeneracy of diﬀusion matrix; Non-gradient drift vectors. Our method is based on the extended second order calculus in generalized optimal transport space. 9
10. ### Review: Optimal transport space The optimal transport has a variational

formulation (Benamou-Brenier 2000): D(ρ0, ρ1)2 := inf v 1 0 EXt∼ρt v(t, Xt ) 2 dt, where E is the expectation operator and the inﬁmum runs over all vector ﬁelds vt , such that ˙ Xt = v(t, Xt ), X0 ∼ ρ0, X1 ∼ ρ1. Under this metric, the probability set has a metric structure1. 1John D. Laﬀerty: the density manifold and conﬁguration space quantization, 1988. 10
11. ### Review: Optimal transport metric Informally speaking, the optimal transport metric

refers to the following bilinear form: ˙ ρ1 , G(ρ) ˙ ρ2 = ( ˙ ρ1 , (−∆ρ )−1 ˙ ρ2 )dx. In other words, denote ˙ ρi = −∇ · (ρ∇φi ), i = 1, 2, then φ1 , G(ρ)−1φ2 = (φ1 , −∇ · (ρ∇)φ2 )dx = (∇φ1 , ∇φ2 )ρdx, where ρ ∈ P(Ω), ˙ ρi is the tangent vector in P(Ω), i.e. ˙ ρi dx = 0, and φi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point ρ. 11
12. ### Review: Optimal transport gradient ﬂows The Wasserstein gradient ﬂow of

an energy functional F(ρ) leads to ∂t ρ = − G(ρ)−1 δ δρ F(ρ) =∇ · (ρ∇ δ δρ F(ρ)). Example If F(ρ) = F(x)ρ(x)dx, then the gradient ﬂow satisﬁes ∂t ρ = ∇ · (ρ∇F(x)). 12
13. ### Entropy dissipation revisited The gradient ﬂow of the KL divergence

DKL (ρ π) = ρ(x)log ρ(x) π(x) dx, w.r.t. optimal transport metric distance satisﬁes the Fokker-Planck equation ∂ρ ∂t = ∇ · (ρ∇log ρ π ). Here the major trick is that ρ∇ log ρ = ∇ρ. 13
14. ### Entropy dissipation revisited In this way, one can study the

ﬁrst order entropy dissipation by d dt DKL (ρt π) = log ρt π ∇ · (ρ∇log ρt π )dx = − ∇log ρt π 2ρdx = − I(ρt π). Similarly, we study the second order entropy dissipation by d dt I(ρt π) = −2 Ω Γ2 (log ρt π , log ρt π )ρt dx, where Γ2 is a bilinear form, which can be deﬁned by the optimal transport second order operator. 14
15. ### Lyapunov methods for degenerate non-gradient ﬂows Consider a perturbed gradient

ﬂow by ˙ ρ = −G(ρ)−1DKL (ρt π) + f( ρt π ), where f is a given function generated by non-gradient drift vector ﬁeld. How can we study the convergence behavior of ρ? 15
16. ### Motivation: Decomposition Assume π is the invariant measure, which is

with the explicit formulation. We decompose the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)a(x)a(x)T∇ log ρ(t, x) π(x) ) + ∇ · (ρ(t, x)γ(x)), Gradient direction Perturbed direction where γ(x) :=a(x)a(x)T∇ log π(x) − b(x) + n+m j=1 ∂ ∂xj (a(x)a(x)T)ij 1≤i≤n+m . and ∇ · (π(x)γ(x)) = 0. 16
17. ### Main result: Structure condition Assumption: for any i ∈ {1,

· · · , n} and k ∈ {1, · · · , m}, we assume zT k ∇aT i ∈ Span{aT 1 , · · · , aT n }. Examples a is a constant vector; a is a matrix function deﬁned by a = a(x1 , · · · , xn ), z ∈ span{en+1 , · · · , en+m }, where ei is the i-th Euclidean basis function. 17
18. ### Main result: Entropy dissipation [F. and Li, 2021] Under the

assumption, for any β ∈ R and a given vector function z, deﬁne matrix functions by R = Ra + Rz + Rπ − MΛ + βRIa + (1 − β)Rγa + Rγz , If there exists a constant λ > 0, such that R λ(aaT + zzT), then the following decay results. DKL (ρt π) ≤ 1 2λ e−2λtIa,z (ρ0 π), where ρt is the solution of Fokker-Planck equation. 18

20. ### Comparisons (i) If γ = 0 and m = 0:

[Bakry-Emery, 1985]. (ii) If γ = 0 and m = 0: [Baudoin-Garofalo, 2017], [F.-Li, 2019]. (iii) If β = 0 and m = 0, [Arnold-Carlen-Ju, 2000, 2008]. (iv) If a, z are constants and β = 0, [Arnold-Erb, 2014][Baudoin-Gordina-Herzog, 2019]. (v) If β = 1, m = 0 and a = I, [Arnold-Carlen]; [F.-Li, 2020]. 20
21. ### Idea of proof Deﬁne Ia,z (ρ π) = Rn+m ∇

log ρ π , (aaT + zzT)∇ log ρ π ρdx. Consider − 1 2 d dt Ia,z (ρt ) = Γ2 (f, f)ρt dxdx · · · (I) + Γz,π 2 (f, f)ρt dx · · · (II) + ΓIa,z (f, f)ρt dx · · · (III) where f = log ρ π , and Γ2 , Γz 2 , Γγ are designed bilinear forms, coming from the second order calculation in density space. (i) If a is non-degenerate, then (II) = 0; (ii) If b is a gradient vector ﬁeld, then (III) = 0. 21
22. ### Detailed approach For any f ∈ C∞(Rn+m), the generator of

Itˆ o SDE satisﬁes Lf = Lf − γ, ∇f , where Lf = ∇ · (aaT∇f) + aaT∇ log π, ∇f . For a given matrix function a ∈ R(n+m)×n, we construct a matrix function z ∈ R(n+m)×m, and deﬁne a z-direction generator by Lz f = ∇ · (zzT∇f) + zzT∇ log π, ∇f . 22
23. ### Global in space computation=Gamma operators Deﬁne Gamma one bilinear forms

by Γ1 (f, f) = aT∇f, aT∇f Rn , Γz 1 (f, f) = zT∇f, zT∇f Rm . Deﬁne Gamma two bilinear forms by (i) Gamma two operator: Γ2 (f, f) = 1 2 LΓ1 (f, f) − Γ1 (Lf, f). (ii) Generalized Gamma z operator: Γz,π 2 (f, f) = 1 2 LΓz 1 (f, f) − Γz 1 (Lf, f) + divπ z Γ1,∇(aaT) (f, f) − divπ a Γ1,∇(zzT) (f, f) . (iii) Irreversible Gamma operator: ΓIa,z (f, f) = (Lf + Lz f) ∇f, γ − 1 2 ∇ Γ1 (f, f) + Γz 1 (f, f) , γ . 23
24. ### Local in space calculation= Bochner’s formula For any f =

log p π ∈ C∞(Rn+m, R) and any β ∈ R, under the assumption, we derive that − 1 2 d dt Ia,z (ρ π) = Γ2 (f, f) + Γz,π 2 (f, f) + ΓIa,z (f, f) pdx = Hessβ f 2 + R(∇f, ∇f) pdx. Clearly, if R λI, we derive a Lyapunov constant λ for the convergence rate. 24
25. ### Example Consider a underdamped Langevin dynamic by dxt =vt dt

dvt =(−T(xt )vt − ∇x U(xt ))dt + 2T(xt )dBt . (1) It can be viewed as, Yt = (xt , vt ), dYt = b(Yt )dt + √ 2a(Yt )dBt , with matrices b = v −T(x)v − ∇U(x) , a = 0 T(x) . Its invariant measure has a closed form, π(x, v) = 1 Z e−H(x,v), H(x, v) = v 2 2 + U(x). 25
26. ### Constant diﬀusion -1 -0.5 0 0.5 1 x 1 -1

-0.5 0 0.5 1 x 2 -1 -0.5 0 0.5 1 1.5 Smallest eignvalue: 0.094 0.095 1 1 0.096 0.097 Smallest eignvalue: 0.098 0.099 0.5 0.5 0.1 x 2 x 1 0 0 -0.5 -0.5 -1 -1 Figure: T=1, U(x) = x2/2; Left β = 0 [Arnold-Erb]; Right β = 0.1; z = (1, 0.1)T. 26
27. ### Variable diﬀusion -0.2 -0.15 1 1 -0.1 -0.05 Smallest eignvalue:

0.9 0.9 0 0.05 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 -0.04 -0.02 1 1 0 0.02 0.04 Smallest eignvalue: 0.06 0.9 0.9 0.08 0.1 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 Figure: U(x) = xc−x c(c−1) , T(x) = (∇2 x U(x))−1, c=2.5, z = 1 0.1 . (Left: β = 0; Right: β = 0.6.) 27
28. ### Discussion Non-gradient ﬂow functional inequalities; Generalized perturbed-gradient ﬂow dynamics; Mean

ﬁeld Bakry-Emery calculus. 28