Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Entropy dissipation on graphs

Wuchen Li
December 29, 2017

Entropy dissipation on graphs

Fokker-Planck equations are important for modeling and applications. In this talk, we consider a numerical method towards them based on optimal transport theory. Many details and properties of numerics will be introduced. This is a joint work with Shui-Nee Chow, Luca Dieci, and Haomin Zhou.

Wuchen Li

December 29, 2017
Tweet

More Decks by Wuchen Li

Other Decks in Science

Transcript

  1. Entropy dissipation semi scheme for Fokker-Planck equations Wuchen Li UCLA

    10 to 60 workshop, December 6, 2017. Joint work with Shui-Nee Chow, Luca Dieci, and Haomin Zhou.
  2. Gradient flow Consider a minimization problem min x∈Rd V (x).

    To find local minimizers, a gradient flow is introduced ˙ xt = −∇V (xt ). To find global minimizers, a perturbed gradient flow is used ˙ Xt = −∇V (Xt ) + 2β ˙ Bt . where Xt is the solution of above stochastic differential equation, Bt is a standard Brownian motion (white noise) in Rd. 3
  3. Fokker-Planck equations Consider the probability density function of Xt Pr(Xt

    ∈ dx) = ρ(t, x)dx, which satisfies the Fokker-Planck equation ∂ρ ∂t = ∇ · (ρ∇V (x)) + β∆ρ. The PDE has many applications in games, physics, modeling, e.t.c. The goal here is to study and compute Fokker-Planck equations in a discrete level, i.e. semi-scheme (continuous in time and discrete in spatial). 4
  4. Entropy dissipation Consider H(ρ) = Rd ρ(x)log ρ(x) e−V (x)

    dx , Along the Fokker-Planck equation, the dissipation relation holds: d dt H(ρ) = − Rd (∇ log ρ e−V )2ρdx = −I(ρ) , where I(ρ) is named Fisher information1. The mathematics behind this relation is shown in optimal transport theory: the density of gradient flow is the gradient flow in the space of densities. 1B. Frieden, Science from Fisher Information: A Unification, 2004. 5
  5. Optimal transport What is the optimal way to move or

    transport the mountain with shape X, density ρ0(x) to another shape Y with density ρ1(y)? The problem was first introduced by Monge in 1781 and relaxed by Kantorovich by 1940. It introduces a metric function on probability set, named optimal transport distance, Wasserstein metric or Earth Mover’s distance. 6
  6. Overview The optimal transport has many different formulations under various

    angles: Mapping/Monge-Amp´ ere equation ; Linear programming ; Geometry/Fluid dynamics/L1 minimization ; which are considered by Otto, Kinderlehrer, Villani, McCann, Carlen, Lott, Strum, Gangbo, Jordan, Evans, Brenier, Benamou, Ambrosio, Gigli, Savare and many more. In this talk, we mainly follow its geometric formulation in a discrete setting. 7
  7. Mapping In 1781, Monge considered the following optimization problem. Given

    two measures ρ0, ρ1 with equal mass. Consider inf T Rd d(x, T(x))ρ0(x)dx where d: Rd × Rd → R+ is the so called ground metric, and the infimum is among all transport maps T, which transfers ρ0(x) to ρ1(x), i.e. ρ0(x) = ρ1(T(x))det(∇T(x)) . 8
  8. Linear programming In 1940, Kantorovich relaxed Monge’s problem by the

    following linear programming: inf π Eπ d(X, Y ) , where E is the expectation operator and the infimum is taken among all joint densities (transport plans) π(x, y) ≥ 0 having random variable X, Y as marginals, i.e. X ∼ ρ0(x) , Y ∼ ρ1(y) . The minimal value provides a particular distance function in the probability set. And its minimizer’s support brings the optimal map. 9
  9. Density manifold Consider d(x, y) = x − y 2.

    The distance has an optimal control formulation (Benamou-Brenier 2000): inf v E 1 0 v(t, Xt )2 dt , where E is the expectation operator and the infimum runs over all vector fields vt , such that ˙ Xt = v(t, Xt ) , X0 ∼ ρ0 , X1 ∼ ρ1 . Under this metric, the probability set has a Riemannian geometry structure2. 2John D. Lafferty: the density manifold and configuration space quantization, 1988. 10
  10. Brownian motion and Entropy dissipation The gradient flow of entropy

    H(ρ) = Rd ρ(x)log ρ(x)dx , w.r.t. optimal transport metric distance is: ∂ρ ∂t = ∇ · (ρ∇log ρ) = ∆ρ . Entropy dissipation: d dt H(ρ) = − Rd (∇ log ρ)2ρdx = g(gradH, gradH) . 11
  11. Goal: Numerical schemes for Fokker-Planck equations Question: Can we consider

    similar entropy dissipation results in numerics? Answer: Yes, we need to build a discrete optimal transport metric. Using this metric, we derive the gradient flow as the semi scheme. Recent Developments: Erbar, Mielke, Mass, Gigli, Strum, Villani, Olivier, Fathi, Chow, Zhou, Li and many more. 12
  12. Basic setting Graph with finite vertices G = (V, E),

    V = {1, · · · , n}, E is the edge set ; Probability set P(G) = {(ρi )n i=1 | n i=1 ρi = 1, ρi ≥ 0} ; Potential energy: F(ρ) = 1 2 n i=1 n j=1 Wij ρi ρj + n i=1 Vi ρi +β n i=1 ρi log ρi , Interaction/Linear Potential energy Boltzmann-Shannon entropy where W is a given symmetric matrix, Vi is a constant scale and β > 0 is a given constant. 13
  13. Optimal transport distance on a graph The metric for any

    ρ0, ρ1 ∈ Po (G) is W(ρ0, ρ1)2 := inf v { 1 0 (v, v)ρ dt : dρ dt + divG (ρv) = 0 , ρ(0) = ρ0, ρ(1) = ρ1} , where (v, v)ρ = 1 2 (i,j)∈E v2 ij gij (ρ) , divG (ρv) = −( j∈N(i) vij gij (ρ))n i=1 , and gij is the “density weight” defined on the edge of graph: gij (ρ) = ρi + ρj 2 . 14
  14. Hodge decomposition on graphs Consider a Hodge decomposition on a

    graph v = ∇G S + u Gradient Divergence free where the divergence free on a graph means divG (ρu) = 0. Lemma The discrete Wasserstein metric is equivalent to inf ∇GS 1 0 (∇G S, ∇G S)ρ dt , where the infimum is taken among all discrete potential vector fields ∇G S, such that dρ dt + divG (ρ∇G S) = 0 , ρ(0) = ρ0, ρ(1) = ρ1 . This metric gives Po (G) Riemannian geometry structure. 15
  15. Discrete probability manifold Denote −divG (ρ∇G S) = L(ρ)S .

    Then the metric in Po (G) is equivalent to inf{ 1 0 ˙ ρT L−1(ρ) ˙ ρ dt : ρ(0) = ρ0 , ρ(1) = ρ1} . Here L(ρ) ∈ Rn×n is the weighted Laplacian matrix L(ρ) = −divG (ρ∇G ) = −DT Θ(ρ)D , where D ∈ R|E|×|V | is a discrete gradient matrix, DT ∈ R|V |×|E| is a discrete divergence matrix, and Θ ∈ R|E|×|E| is a diagonal weight matrix Θ(i,j)∈E,(k,l)∈E = ρi+ρj 2 if (i, j) = (k, l) ∈ E ; 0 otherwise . 16
  16. Gradient flow The gradient flow means dρ dt = gradPo(G)

    F(ρ) , where the gradient is defined by: Tangency: gradPo(G) F(ρ) ∈ Tρ Po (G) . Duality: (gradPo(G) F(ρ), σ)ρ = dF(ρ) · σ , for any σ ∈ Tρ Po (G) . where dF(ρ) = ( ∂ ∂ρi F(ρ))n i=1 . 17
  17. Derivation Theorem ODE dρ dt = L(ρ)∇ρ F(ρ) , i.e.

    dρi dt = j∈N(i) gij (ρ)( ∂ ∂ρj − ∂ ∂ρi )F(ρ) , (1) is the gradient flow of free energy F(ρ) on Po (G) with respect to the discrete optimal transport distance. 18
  18. Asymptotical behavior Theorem For any initial condition ρ0 ∈ Po

    (G), (1) has a unique solution ρ(t) : [0, ∞) → Po (G). (i) The free energy F(ρ) is a Lyapunov function of (1) ; (ii) If limt→∞ ρ(t) exists, call it ρ∞, then ρ∞ is one of the possible Gibbs measures, i.e. ρ∞ i = 1 K e− Fi(ρ∞) β , K = n i=1 e− Fi(ρ∞) β for all i ∈ S , where Fi (ρ) = n i=1 Wij ρj + Vi . 19
  19. Entropy dissipation What is the stability of a Gibbs measure?

    Motivation Relative Entropy and Fisher information, e.g. Carrillo, McCann and Villani’s work3; Gradient flows: dynamical systems viewpoint! 3“Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates”, 2003. 20
  20. Dissipation relation d dt (F(ρ) − F(ρ∞)) = − (∇G

    (F + β log ρ), ∇G (F + β log ρ))ρ = − (i,j)∈E [log ρi e−Fi(ρ)/β − log ρj e−Fj (ρ)/β ]2gij (ρ) ≤ 0 . Remark: If F(ρ) = n i=1 Vi ρi + i∈V ρi log ρi , then the above gives the discrete analog of entropy dissipation d dt ( i∈V ρi (t) log ρi (t) ρ∞ i ) = − (i,j)∈E gij (ρ)(log ρi (t) ρ∞ i − log ρj (t) ρ∞ j )2 . 21
  21. Discrete entropy dissipation Theorem If the Gibbs measure ρ∞ is

    a strict minimizer of F(ρ), then there exists a constant C > 0, such that F(ρ(t)) − F(ρ∞) ≤ e−Ct(F(ρ0) − F(ρ∞)) . Moreover, the asymptotic dissipation rate is 2λF (ρ∞). In other words, for any sufficient small > 0, there exists a time T > 0, such that when t > T, F(ρ(t)) − F(ρ∞) ≤ e−2(λF (ρ∞)− )(t−T )(F(ρ(T)) − F(ρ∞)) . 22
  22. Idea of Proof The speed of convergence comes from comparing

    the ratio between the first and second derivative of F(ρ(t)) along with the ODE. If one can find a constant C > 0, such that d2 dt2 F(ρ(t)) ≥ −C d dt F(ρ(t)) , holds for all t ≥ 0. Then by integrating the above formula in [t, +∞], one obtains d dt [F(ρ∞) − F(ρ(t))] ≥ −C[F(ρ∞) − F(ρ(t))] . Proceed with the Gronwall’s inequality, the result is proved. 23
  23. Proof In our case, the first derivative of energy along

    the gradient flow is d dt F(ρ(t)) = F(ρ)T ˙ ρ = −F(ρ)T L(ρ)F(ρ) = − ˙ ρT L−1(ρ) ˙ ρ , while the second derivative forms d2 dt2 F(ρ(t)) =2 ˙ ρT HessF(ρ) ˙ ρ − ˙ ρT L−1(ρ)L( ˙ ρ)L−1(ρ) ˙ ρ . Compare d dt F(ρ(t)) with d2 dt2 F(ρ(t)) to find C := inf ρ∈B(ρ0) 2 ˙ ρT HessF(ρ) ˙ ρ ˙ ρT L−1(ρ) ˙ ρ − ˙ ρT L−1(ρ)L( ˙ ρ)L−1(ρ) ˙ ρ ˙ ρT L−1(ρ) ˙ ρ . Quadratic Cubic 24
  24. Hessian operator at Gibbs measure Denote ˙ ρ = L(ρ)Φ.

    Consider λF (ρ) = min Φ∈Rn (i,j)∈E (k,l)∈E hij,kl (Φi − Φj )gij (ρ)(Φk − Φl )gkl (ρ) s.t. (i,j)∈E (Φi − Φj )2gij (ρ) = 1 . Here hij,kl = ( ∂2 ∂ρi ∂ρk + ∂2 ∂ρj ∂ρl − ∂2 ∂ρi ∂ρl − ∂2 ∂ρj ∂ρk )F(ρ) . This rate connects with Yano formula4, which is related to Ricci curvature in geometry. 4Kentaro Yano, “On Harmonic and Killing Vector Fields”, 38-45, Annals of Mathematics, 1958. 25
  25. Hessian at Gibbs measure (HessP2(M) F · ∇Φ, ∇Φ)ρ∗ =

    M M δ2 δρ(x)δρ(y) F(ρ)|ρ∗ ∇ · (ρ∗(x)∇Φ(x))∇ · (ρ∗(y)∇Φ(y))dxdy = M M (Dxy δ2 δρ(x)δρ(y) F(ρ)|ρ∗ ∇Φ(x), ∇Φ(y))ρ∗(x)ρ∗(y)dxdy, where ρ∗ is a Gibbs measure, M is a Riemannian manifold including Rd, and δ2 δρ(x)δρ(y) F(ρ) is the second variation of functional F(ρ). 28
  26. Linear Entropy+ Yano formula Consider F(ρ) = M ρ(x) log

    ρ(x)dx, whose Gibbs measure is a uniform measure. Then (HessP(M) F · ∇Φ, ∇Φ)ρ∗ = M [Ric(∇Φ, ∇Φ) + tr(D2ΦT D2Φ)]ρ∗(x)dx = M [∇ · (ρ∗∇Φ)]2 1 ρ∗(x) dx. The first equality is derived through Bochner’s formula5, while the second equality is new. It shows the famous Yano’s formula. 5C´ edric Villani, Optimal transport: Old and new, 2008. 29
  27. Main references C´ edric Villani Optimal transport: Old and new,

    2008. Shui-Nee Chow, Luca Dieci, Wuchen Li and Haomin Zhou Entropy dissipation semi-schemes for Fokker-Planck equations, 2016. 30