Entropy dissipation on graphs

Slide 1

Slide 1 text

Entropy dissipation semi scheme for Fokker-Planck equations Wuchen Li UCLA 10 to 60 workshop, December 6, 2017. Joint work with Shui-Nee Chow, Luca Dieci, and Haomin Zhou.

Slide 2

Slide 2 text

Motivation 2

Slide 3

Slide 3 text

Gradient flow Consider a minimization problem min x∈Rd V (x). To find local minimizers, a gradient flow is introduced ˙ xt = −∇V (xt ). To find global minimizers, a perturbed gradient flow is used ˙ Xt = −∇V (Xt ) + 2β ˙ Bt . where Xt is the solution of above stochastic differential equation, Bt is a standard Brownian motion (white noise) in Rd. 3

Slide 4

Slide 4 text

Fokker-Planck equations Consider the probability density function of Xt Pr(Xt ∈ dx) = ρ(t, x)dx, which satisﬁes the Fokker-Planck equation ∂ρ ∂t = ∇ · (ρ∇V (x)) + β∆ρ. The PDE has many applications in games, physics, modeling, e.t.c. The goal here is to study and compute Fokker-Planck equations in a discrete level, i.e. semi-scheme (continuous in time and discrete in spatial). 4

Slide 5

Slide 5 text

Entropy dissipation Consider H(ρ) = Rd ρ(x)log ρ(x) e−V (x) dx , Along the Fokker-Planck equation, the dissipation relation holds: d dt H(ρ) = − Rd (∇ log ρ e−V )2ρdx = −I(ρ) , where I(ρ) is named Fisher information1. The mathematics behind this relation is shown in optimal transport theory: the density of gradient flow is the gradient flow in the space of densities. 1B. Frieden, Science from Fisher Information: A Unification, 2004. 5

Slide 6

Slide 6 text

Optimal transport What is the optimal way to move or transport the mountain with shape X, density ρ0(x) to another shape Y with density ρ1(y)? The problem was ﬁrst introduced by Monge in 1781 and relaxed by Kantorovich by 1940. It introduces a metric function on probability set, named optimal transport distance, Wasserstein metric or Earth Mover’s distance. 6

Slide 7

Slide 7 text

Overview The optimal transport has many diﬀerent formulations under various angles: Mapping/Monge-Amp´ ere equation ; Linear programming ; Geometry/Fluid dynamics/L1 minimization ; which are considered by Otto, Kinderlehrer, Villani, McCann, Carlen, Lott, Strum, Gangbo, Jordan, Evans, Brenier, Benamou, Ambrosio, Gigli, Savare and many more. In this talk, we mainly follow its geometric formulation in a discrete setting. 7

Slide 8

Slide 8 text

Mapping In 1781, Monge considered the following optimization problem. Given two measures ρ0, ρ1 with equal mass. Consider inf T Rd d(x, T(x))ρ0(x)dx where d: Rd × Rd → R+ is the so called ground metric, and the inﬁmum is among all transport maps T, which transfers ρ0(x) to ρ1(x), i.e. ρ0(x) = ρ1(T(x))det(∇T(x)) . 8

Slide 9

Slide 9 text

Linear programming In 1940, Kantorovich relaxed Monge’s problem by the following linear programming: inf π Eπ d(X, Y ) , where E is the expectation operator and the inﬁmum is taken among all joint densities (transport plans) π(x, y) ≥ 0 having random variable X, Y as marginals, i.e. X ∼ ρ0(x) , Y ∼ ρ1(y) . The minimal value provides a particular distance function in the probability set. And its minimizer’s support brings the optimal map. 9

Slide 10

Slide 10 text

Density manifold Consider d(x, y) = x − y 2. The distance has an optimal control formulation (Benamou-Brenier 2000): inf v E 1 0 v(t, Xt )2 dt , where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt ) , X0 ∼ ρ0 , X1 ∼ ρ1 . Under this metric, the probability set has a Riemannian geometry structure2. 2John D. Lafferty: the density manifold and configuration space quantization, 1988. 10

Slide 11

Slide 11 text

Brownian motion and Entropy dissipation The gradient ﬂow of entropy H(ρ) = Rd ρ(x)log ρ(x)dx , w.r.t. optimal transport metric distance is: ∂ρ ∂t = ∇ · (ρ∇log ρ) = ∆ρ . Entropy dissipation: d dt H(ρ) = − Rd (∇ log ρ)2ρdx = g(gradH, gradH) . 11

Slide 12

Slide 12 text

Goal: Numerical schemes for Fokker-Planck equations Question: Can we consider similar entropy dissipation results in numerics? Answer: Yes, we need to build a discrete optimal transport metric. Using this metric, we derive the gradient ﬂow as the semi scheme. Recent Developments: Erbar, Mielke, Mass, Gigli, Strum, Villani, Olivier, Fathi, Chow, Zhou, Li and many more. 12

Slide 13

Slide 13 text

Basic setting Graph with ﬁnite vertices G = (V, E), V = {1, · · · , n}, E is the edge set ; Probability set P(G) = {(ρi )n i=1 | n i=1 ρi = 1, ρi ≥ 0} ; Potential energy: F(ρ) = 1 2 n i=1 n j=1 Wij ρi ρj + n i=1 Vi ρi +β n i=1 ρi log ρi , Interaction/Linear Potential energy Boltzmann-Shannon entropy where W is a given symmetric matrix, Vi is a constant scale and β > 0 is a given constant. 13

Slide 14

Slide 14 text

Optimal transport distance on a graph The metric for any ρ0, ρ1 ∈ Po (G) is W(ρ0, ρ1)2 := inf v { 1 0 (v, v)ρ dt : dρ dt + divG (ρv) = 0 , ρ(0) = ρ0, ρ(1) = ρ1} , where (v, v)ρ = 1 2 (i,j)∈E v2 ij gij (ρ) , divG (ρv) = −( j∈N(i) vij gij (ρ))n i=1 , and gij is the “density weight” deﬁned on the edge of graph: gij (ρ) = ρi + ρj 2 . 14

Slide 15

Slide 15 text

Hodge decomposition on graphs Consider a Hodge decomposition on a graph v = ∇G S + u Gradient Divergence free where the divergence free on a graph means divG (ρu) = 0. Lemma The discrete Wasserstein metric is equivalent to inf ∇GS 1 0 (∇G S, ∇G S)ρ dt , where the inﬁmum is taken among all discrete potential vector ﬁelds ∇G S, such that dρ dt + divG (ρ∇G S) = 0 , ρ(0) = ρ0, ρ(1) = ρ1 . This metric gives Po (G) Riemannian geometry structure. 15

Slide 16

Slide 16 text

Discrete probability manifold Denote −divG (ρ∇G S) = L(ρ)S . Then the metric in Po (G) is equivalent to inf{ 1 0 ˙ ρT L−1(ρ) ˙ ρ dt : ρ(0) = ρ0 , ρ(1) = ρ1} . Here L(ρ) ∈ Rn×n is the weighted Laplacian matrix L(ρ) = −divG (ρ∇G ) = −DT Θ(ρ)D , where D ∈ R|E|×|V | is a discrete gradient matrix, DT ∈ R|V |×|E| is a discrete divergence matrix, and Θ ∈ R|E|×|E| is a diagonal weight matrix Θ(i,j)∈E,(k,l)∈E = ρi+ρj 2 if (i, j) = (k, l) ∈ E ; 0 otherwise . 16

Slide 17

Slide 17 text

Gradient flow The gradient flow means dρ dt = gradPo(G) F(ρ) , where the gradient is defined by: Tangency: gradPo(G) F(ρ) ∈ Tρ Po (G) . Duality: (gradPo(G) F(ρ), σ)ρ = dF(ρ) · σ , for any σ ∈ Tρ Po (G) . where dF(ρ) = ( ∂ ∂ρi F(ρ))n i=1 . 17

Slide 18

Slide 18 text

Derivation Theorem ODE dρ dt = L(ρ)∇ρ F(ρ) , i.e. dρi dt = j∈N(i) gij (ρ)( ∂ ∂ρj − ∂ ∂ρi )F(ρ) , (1) is the gradient ﬂow of free energy F(ρ) on Po (G) with respect to the discrete optimal transport distance. 18

Slide 19

Slide 19 text

Asymptotical behavior Theorem For any initial condition ρ0 ∈ Po (G), (1) has a unique solution ρ(t) : [0, ∞) → Po (G). (i) The free energy F(ρ) is a Lyapunov function of (1) ; (ii) If limt→∞ ρ(t) exists, call it ρ∞, then ρ∞ is one of the possible Gibbs measures, i.e. ρ∞ i = 1 K e− Fi(ρ∞) β , K = n i=1 e− Fi(ρ∞) β for all i ∈ S , where Fi (ρ) = n i=1 Wij ρj + Vi . 19

Slide 20

Slide 20 text

Entropy dissipation What is the stability of a Gibbs measure? Motivation Relative Entropy and Fisher information, e.g. Carrillo, McCann and Villani’s work3; Gradient ﬂows: dynamical systems viewpoint! 3“Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates”, 2003. 20

Slide 21

Slide 21 text

Dissipation relation d dt (F(ρ) − F(ρ∞)) = − (∇G (F + β log ρ), ∇G (F + β log ρ))ρ = − (i,j)∈E [log ρi e−Fi(ρ)/β − log ρj e−Fj (ρ)/β ]2gij (ρ) ≤ 0 . Remark: If F(ρ) = n i=1 Vi ρi + i∈V ρi log ρi , then the above gives the discrete analog of entropy dissipation d dt ( i∈V ρi (t) log ρi (t) ρ∞ i ) = − (i,j)∈E gij (ρ)(log ρi (t) ρ∞ i − log ρj (t) ρ∞ j )2 . 21

Slide 22

Slide 22 text

Discrete entropy dissipation Theorem If the Gibbs measure ρ∞ is a strict minimizer of F(ρ), then there exists a constant C > 0, such that F(ρ(t)) − F(ρ∞) ≤ e−Ct(F(ρ0) − F(ρ∞)) . Moreover, the asymptotic dissipation rate is 2λF (ρ∞). In other words, for any suﬃcient small > 0, there exists a time T > 0, such that when t > T, F(ρ(t)) − F(ρ∞) ≤ e−2(λF (ρ∞)− )(t−T )(F(ρ(T)) − F(ρ∞)) . 22

Slide 23

Slide 23 text

Idea of Proof The speed of convergence comes from comparing the ratio between the ﬁrst and second derivative of F(ρ(t)) along with the ODE. If one can ﬁnd a constant C > 0, such that d2 dt2 F(ρ(t)) ≥ −C d dt F(ρ(t)) , holds for all t ≥ 0. Then by integrating the above formula in [t, +∞], one obtains d dt [F(ρ∞) − F(ρ(t))] ≥ −C[F(ρ∞) − F(ρ(t))] . Proceed with the Gronwall’s inequality, the result is proved. 23

Slide 24

Slide 24 text

Proof In our case, the first derivative of energy along the gradient flow is d dt F(ρ(t)) = F(ρ)T ˙ ρ = −F(ρ)T L(ρ)F(ρ) = − ˙ ρT L−1(ρ) ˙ ρ , while the second derivative forms d2 dt2 F(ρ(t)) =2 ˙ ρT HessF(ρ) ˙ ρ − ˙ ρT L−1(ρ)L( ˙ ρ)L−1(ρ) ˙ ρ . Compare d dt F(ρ(t)) with d2 dt2 F(ρ(t)) to find C := inf ρ∈B(ρ0) 2 ˙ ρT HessF(ρ) ˙ ρ ˙ ρT L−1(ρ) ˙ ρ − ˙ ρT L−1(ρ)L( ˙ ρ)L−1(ρ) ˙ ρ ˙ ρT L−1(ρ) ˙ ρ . Quadratic Cubic 24

Slide 25

Slide 25 text

Hessian operator at Gibbs measure Denote ˙ ρ = L(ρ)Φ. Consider λF (ρ) = min Φ∈Rn (i,j)∈E (k,l)∈E hij,kl (Φi − Φj )gij (ρ)(Φk − Φl )gkl (ρ) s.t. (i,j)∈E (Φi − Φj )2gij (ρ) = 1 . Here hij,kl = ( ∂2 ∂ρi ∂ρk + ∂2 ∂ρj ∂ρl − ∂2 ∂ρi ∂ρl − ∂2 ∂ρj ∂ρk )F(ρ) . This rate connects with Yano formula4, which is related to Ricci curvature in geometry. 4Kentaro Yano, “On Harmonic and Killing Vector Fields”, 38-45, Annals of Mathematics, 1958. 25

Slide 26

Slide 26 text

Numerical example: Gradient ﬂow 26

Slide 27

Slide 27 text

Numerical example: Van der Pol oscillator 27

Slide 28

Slide 28 text

Hessian at Gibbs measure (HessP2(M) F · ∇Φ, ∇Φ)ρ∗ = M M δ2 δρ(x)δρ(y) F(ρ)|ρ∗ ∇ · (ρ∗(x)∇Φ(x))∇ · (ρ∗(y)∇Φ(y))dxdy = M M (Dxy δ2 δρ(x)δρ(y) F(ρ)|ρ∗ ∇Φ(x), ∇Φ(y))ρ∗(x)ρ∗(y)dxdy, where ρ∗ is a Gibbs measure, M is a Riemannian manifold including Rd, and δ2 δρ(x)δρ(y) F(ρ) is the second variation of functional F(ρ). 28

Slide 29

Slide 29 text

Linear Entropy+ Yano formula Consider F(ρ) = M ρ(x) log ρ(x)dx, whose Gibbs measure is a uniform measure. Then (HessP(M) F · ∇Φ, ∇Φ)ρ∗ = M [Ric(∇Φ, ∇Φ) + tr(D2ΦT D2Φ)]ρ∗(x)dx = M [∇ · (ρ∗∇Φ)]2 1 ρ∗(x) dx. The ﬁrst equality is derived through Bochner’s formula5, while the second equality is new. It shows the famous Yano’s formula. 5C´ edric Villani, Optimal transport: Old and new, 2008. 29

Slide 30

Slide 30 text

Main references C´ edric Villani Optimal transport: Old and new, 2008. Shui-Nee Chow, Luca Dieci, Wuchen Li and Haomin Zhou Entropy dissipation semi-schemes for Fokker-Planck equations, 2016. 30

Slide 31

Slide 31 text

Happy Birthday and Best wishes to Prof. Dieci. 31