Entropy dissipation via information Gamma calculus

Entropy dissipation via information Gamma calculus Wuchen Li University of
South Carolina Analysis Seminar, CMUC, April 9th. This is based on a joint work with Qi Feng (USC). 1

Dynamics and Lyapunov functionals 2

Stochastic differential equations Consider a stochastic differential equation (SDE) by
˙ Xt = b(Xt ) + √ 2a(Xt ) ˙ Bt , where (n, m) ∈ N, Xt ∈ Rn+m, b ∈ Rn+m is a drift vector function, a(Xt ) ∈ R(n+m)×n is a diffusion matrix function, and Bt ∈ Rn is a standard Brownian motion. The above SDE has been widely used in practice. Mathematical physics equations; Protein folding. Designing Markov-Chain-Monte-Caro algorithms; 3

Example: Langevin dynamics We review a classical example. We start
with a gradient drift-diﬀusion process by ˙ Xt = −∇V (Xt ) + √ 2 ˙ Bt , where V is a given potential function. Let ρ(t, x) be the probability density function of Xt , which satisﬁes the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)∇V (x)) + ∆ρ(t, x). Here π(x) := 1 Z e−V (x), where Z = e−V dx < ∞, is the invariant distribution of the SDE. Here the main question is that How fast does ρ(t, x) converge to the invariant distribution π? 4

Lyapunov methods To study the dynamical behavior of ρ, we
apply a Lyapunov functional by DKL (ρt π) = ρt (x)log ρt (x) π(x) dx. Along the Fokker-Planck equation, the first order dissipation satisfies d dt DKL (ρt π) = − ∇x log ρt (x) π(x) 2ρt dx, And the second order dissipation satisfies d2 dt2 DKL (ρt π) = 2 ∇2 xx log ρt π 2 F −∇2 xx log π(∇x log ρt π , ∇x log ρt π ) ρt dx, where · F is a matrix Frobenius norm. In literature, DKL is named the Kullback–Leibler divergence (relative entropy) and I = − d dt DKL is called the relative Fisher information functional. 5

Lyapunov constant Suppose there exists a “Lyapunov constant” λ >
0, such that −∇2 xx log π(x) λI. Then d2 dt2 DKL (ρt π) ≥ −2λ d dt DKL (ρt π). By integrating on the time variable, one can prove the exponential convergence by DKL (ρt π) ≤ e−2λtDKL (ρ0 π). As a by-product, one can show the log-Sobolev inequality by DKL (ρ π) ≤ 1 2λ I(ρ π). 6

Literature There are several equivalent approaches to establish the Lyapunov
constant for gradient dynamics. Log-Sobolev inequality (Gross); Iterative Gamma calculus (Bakry, Emery, Baudoin, Garofalo et.al.); Entropy dissipation (Arnold, Carlen, Carrilo, Mohout, Jungel, Markowich, Toscani et.al.); Optimal transport, displacement convexity and Hessian operators in density space. (Mccann, Ambrosio, Villani, Otto, Gangbo et.al.); Transport Lyapunov functional (Renesse, Strum et.al.). 7

Problem Recall that ˙ Xt = b(Xt ) + √
2a(Xt ) ˙ Bt , where b can be a non-gradient drift vector, and a is a degenerate matrix. And its Fokker-Planck equation (Hypoelliptic) satisﬁes ∂t ρ = −∇ · (ρb) + n+m i=1 n+m j=1 ∂2 ∂xi ∂xj (a(x)a(x)T)ij ρ . Assume that there exists an invariant distribution π with a given explicit formulation. The major problem is given below. How fast does ρ converge to the invariant distribution π? 8

Goals In this talk, we mainly consider the entropy dissipation
for perturbed-gradient dynamical systems. Main diﬃculties. Degeneracy of diﬀusion matrix; Non-gradient drift vectors. Our method is based on the extended second order calculus in generalized optimal transport space. 9

Review: Optimal transport space The optimal transport has a variational
formulation (Benamou-Brenier 2000): D(ρ0, ρ1)2 := inf v 1 0 EXt∼ρt v(t, Xt ) 2 dt, where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt ), X0 ∼ ρ0, X1 ∼ ρ1. Under this metric, the probability set has a metric structure1. 1John D. Lafferty: the density manifold and configuration space quantization, 1988. 10

Review: Optimal transport metric Informally speaking, the optimal transport metric
refers to the following bilinear form: ˙ ρ1 , G(ρ) ˙ ρ2 = ( ˙ ρ1 , (−∆ρ )−1 ˙ ρ2 )dx. In other words, denote ˙ ρi = −∇ · (ρ∇φi ), i = 1, 2, then φ1 , G(ρ)−1φ2 = (φ1 , −∇ · (ρ∇)φ2 )dx = (∇φ1 , ∇φ2 )ρdx, where ρ ∈ P(Ω), ˙ ρi is the tangent vector in P(Ω), i.e. ˙ ρi dx = 0, and φi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point ρ. 11

Review: Optimal transport gradient flows The Wasserstein gradient flow of
an energy functional F(ρ) leads to ∂t ρ = − G(ρ)−1 δ δρ F(ρ) =∇ · (ρ∇ δ δρ F(ρ)). Example If F(ρ) = F(x)ρ(x)dx, then the gradient flow satisfies ∂t ρ = ∇ · (ρ∇F(x)). 12

Entropy dissipation revisited The gradient ﬂow of the KL divergence
DKL (ρ π) = ρ(x)log ρ(x) π(x) dx, w.r.t. optimal transport metric distance satisﬁes the Fokker-Planck equation ∂ρ ∂t = ∇ · (ρ∇log ρ π ). Here the major trick is that ρ∇ log ρ = ∇ρ. 13

Entropy dissipation revisited In this way, one can study the
ﬁrst order entropy dissipation by d dt DKL (ρt π) = log ρt π ∇ · (ρ∇log ρt π )dx = − ∇log ρt π 2ρdx = − I(ρt π). Similarly, we study the second order entropy dissipation by d dt I(ρt π) = −2 Ω Γ2 (log ρt π , log ρt π )ρt dx, where Γ2 is a bilinear form, which can be deﬁned by the optimal transport second order operator. 14

Lyapunov methods for degenerate non-gradient flows Consider a perturbed gradient
flow by ˙ ρ = −G(ρ)−1DKL (ρt π) + f( ρt π ), where f is a given function generated by non-gradient drift vector field. How can we study the convergence behavior of ρ? 15

Motivation: Decomposition Assume π is the invariant measure, which is
with the explicit formulation. We decompose the Fokker-Planck equation by ∂t ρ(t, x) = ∇ · (ρ(t, x)a(x)a(x)T∇ log ρ(t, x) π(x) ) + ∇ · (ρ(t, x)γ(x)), Gradient direction Perturbed direction where γ(x) :=a(x)a(x)T∇ log π(x) − b(x) + n+m j=1 ∂ ∂xj (a(x)a(x)T)ij 1≤i≤n+m . and ∇ · (π(x)γ(x)) = 0. 16

Main result: Structure condition Assumption: for any i ∈ {1,
· · · , n} and k ∈ {1, · · · , m}, we assume zT k ∇aT i ∈ Span{aT 1 , · · · , aT n }. Examples a is a constant vector; a is a matrix function deﬁned by a = a(x1 , · · · , xn ), z ∈ span{en+1 , · · · , en+m }, where ei is the i-th Euclidean basis function. 17

Main result: Entropy dissipation [F. and Li, 2021] Under the
assumption, for any β ∈ R and a given vector function z, deﬁne matrix functions by R = Ra + Rz + Rπ − MΛ + βRIa + (1 − β)Rγa + Rγz , If there exists a constant λ > 0, such that R λ(aaT + zzT), then the following decay results. DKL (ρt π) ≤ 1 2λ e−2λtIa,z (ρ0 π), where ρt is the solution of Fokker-Planck equation. 18

Tensors 19

Comparisons (i) If γ = 0 and m = 0:
[Bakry-Emery, 1985]. (ii) If γ = 0 and m = 0: [Baudoin-Garofalo, 2017], [F.-Li, 2019]. (iii) If β = 0 and m = 0, [Arnold-Carlen-Ju, 2000, 2008]. (iv) If a, z are constants and β = 0, [Arnold-Erb, 2014][Baudoin-Gordina-Herzog, 2019]. (v) If β = 1, m = 0 and a = I, [Arnold-Carlen]; [F.-Li, 2020]. 20

Idea of proof Deﬁne Ia,z (ρ π) = Rn+m ∇
log ρ π , (aaT + zzT)∇ log ρ π ρdx. Consider − 1 2 d dt Ia,z (ρt ) = Γ2 (f, f)ρt dxdx · · · (I) + Γz,π 2 (f, f)ρt dx · · · (II) + ΓIa,z (f, f)ρt dx · · · (III) where f = log ρ π , and Γ2 , Γz 2 , Γγ are designed bilinear forms, coming from the second order calculation in density space. (i) If a is non-degenerate, then (II) = 0; (ii) If b is a gradient vector ﬁeld, then (III) = 0. 21

Detailed approach For any f ∈ C∞(Rn+m), the generator of
Itˆ o SDE satisﬁes Lf = Lf − γ, ∇f , where Lf = ∇ · (aaT∇f) + aaT∇ log π, ∇f . For a given matrix function a ∈ R(n+m)×n, we construct a matrix function z ∈ R(n+m)×m, and deﬁne a z-direction generator by Lz f = ∇ · (zzT∇f) + zzT∇ log π, ∇f . 22

Global in space computation=Gamma operators Deﬁne Gamma one bilinear forms
by Γ1 (f, f) = aT∇f, aT∇f Rn , Γz 1 (f, f) = zT∇f, zT∇f Rm . Deﬁne Gamma two bilinear forms by (i) Gamma two operator: Γ2 (f, f) = 1 2 LΓ1 (f, f) − Γ1 (Lf, f). (ii) Generalized Gamma z operator: Γz,π 2 (f, f) = 1 2 LΓz 1 (f, f) − Γz 1 (Lf, f) + divπ z Γ1,∇(aaT) (f, f) − divπ a Γ1,∇(zzT) (f, f) . (iii) Irreversible Gamma operator: ΓIa,z (f, f) = (Lf + Lz f) ∇f, γ − 1 2 ∇ Γ1 (f, f) + Γz 1 (f, f) , γ . 23

Local in space calculation= Bochner’s formula For any f =
log p π ∈ C∞(Rn+m, R) and any β ∈ R, under the assumption, we derive that − 1 2 d dt Ia,z (ρ π) = Γ2 (f, f) + Γz,π 2 (f, f) + ΓIa,z (f, f) pdx = Hessβ f 2 + R(∇f, ∇f) pdx. Clearly, if R λI, we derive a Lyapunov constant λ for the convergence rate. 24

Example Consider a underdamped Langevin dynamic by dxt =vt dt
dvt =(−T(xt )vt − ∇x U(xt ))dt + 2T(xt )dBt . (1) It can be viewed as, Yt = (xt , vt ), dYt = b(Yt )dt + √ 2a(Yt )dBt , with matrices b = v −T(x)v − ∇U(x) , a = 0 T(x) . Its invariant measure has a closed form, π(x, v) = 1 Z e−H(x,v), H(x, v) = v 2 2 + U(x). 25

Constant diﬀusion -1 -0.5 0 0.5 1 x 1 -1
-0.5 0 0.5 1 x 2 -1 -0.5 0 0.5 1 1.5 Smallest eignvalue: 0.094 0.095 1 1 0.096 0.097 Smallest eignvalue: 0.098 0.099 0.5 0.5 0.1 x 2 x 1 0 0 -0.5 -0.5 -1 -1 Figure: T=1, U(x) = x2/2; Left β = 0 [Arnold-Erb]; Right β = 0.1; z = (1, 0.1)T. 26

Variable diﬀusion -0.2 -0.15 1 1 -0.1 -0.05 Smallest eignvalue:
0.9 0.9 0 0.05 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 -0.04 -0.02 1 1 0 0.02 0.04 Smallest eignvalue: 0.06 0.9 0.9 0.08 0.1 0.8 0.8 x 1 x 2 0.7 0.7 0.6 0.6 0.5 0.5 Figure: U(x) = xc−x c(c−1) , T(x) = (∇2 x U(x))−1, c=2.5, z = 1 0.1 . (Left: β = 0; Right: β = 0.6.) 27

Discussion Non-gradient flow functional inequalities; Generalized perturbed-gradient flow dynamics; Mean
field Bakry-Emery calculus. 28

Entropy dissipation via information Gamma calculus

Entropy dissipation via information Gamma calculus

Wuchen Li

More Decks by Wuchen Li

Other Decks in Education

Featured

Transcript

Entropy dissipation via information Gamma calculus Wuchen Li University of

Dynamics and Lyapunov functionals 2

Stochastic diﬀerential equations Consider a stochastic diﬀerential equation (SDE) by

Example: Langevin dynamics We review a classical example. We start

Lyapunov methods To study the dynamical behavior of ρ, we

Lyapunov constant Suppose there exists a “Lyapunov constant” λ >

Literature There are several equivalent approaches to establish the Lyapunov

Problem Recall that ˙ Xt = b(Xt ) + √

Goals In this talk, we mainly consider the entropy dissipation

Review: Optimal transport space The optimal transport has a variational

Review: Optimal transport metric Informally speaking, the optimal transport metric

Review: Optimal transport gradient ﬂows The Wasserstein gradient ﬂow of

Entropy dissipation revisited The gradient ﬂow of the KL divergence

Entropy dissipation revisited In this way, one can study the

Lyapunov methods for degenerate non-gradient ﬂows Consider a perturbed gradient

Motivation: Decomposition Assume π is the invariant measure, which is

Main result: Structure condition Assumption: for any i ∈ {1,

Main result: Entropy dissipation [F. and Li, 2021] Under the

Tensors 19

Comparisons (i) If γ = 0 and m = 0:

Idea of proof Deﬁne Ia,z (ρ π) = Rn+m ∇

Detailed approach For any f ∈ C∞(Rn+m), the generator of

Global in space computation=Gamma operators Deﬁne Gamma one bilinear forms

Local in space calculation= Bochner’s formula For any f =

Example Consider a underdamped Langevin dynamic by dxt =vt dt

Constant diﬀusion -1 -0.5 0 0.5 1 x 1 -1

Variable diﬀusion -0.2 -0.15 1 1 -0.1 -0.05 Smallest eignvalue:

Discussion Non-gradient ﬂow functional inequalities; Generalized perturbed-gradient ﬂow dynamics; Mean