Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mean field information dynamics via reaction diffusions

Wuchen Li
January 13, 2022

Mean field information dynamics via reaction diffusions

We formulate and compute a class of mean-field information dynamics based on reaction diffusion equations. Given a class of nonlinear reaction diffusion and entropy type Lyapunov functionals, we study their gradient flow formulations. We write the “mean-field” metric space formalisms and derive Hamiltonian flows therein. These Hamiltonian flows follow saddle point systems of the proposed mean-field control problems. We apply primal-dual hybrid-gradient algorithms to compute the mean field information dynamics. Several numerical examples are provided.

Wuchen Li

January 13, 2022
Tweet

More Decks by Wuchen Li

Other Decks in Research

Transcript

  1. Mean field information dynamics via reaction diffusions Wuchen Li University

    of South Carolina Hamilton-Jacobi equations reunion 2022, IPAM A joint work with Wonjun Lee (UCLA) and Stanley Osher (UCLA). 1
  2. Reaction diffusion Consider a scalar nonlinear reaction diffusion equation by

    ∂t u(t, x) = ∆F(u(t, x)) + R(u(t, x)), where x ∈ Ω, Ω ⊂ Rd is a compact convex spatial domain, u ∈ M(Ω) = {u ∈ C∞(Ω): u ≥ 0}, and ∆ is the Euclidean Laplacian operator. 5
  3. Reaction diffusions and generalized optimal transport Based on thanksgiving holiday

    studies 2019, we are working on reaction diffusions and mean field control problems. In this talk, we first review reaction diffusions with optimal transport type metric space. Using this connection, we further design mean field control problems for reaction-diffusion. This also provides variational schemes for solving classical reaction-diffusion in both time and spatial domains. 6
  4. Diffusions and Entropy dissipation Consider a heat equation in Rd

    by ∂u(t, x) ∂t = ∇ · (∇u(t, x)) = ∆u(t, x). Consider the negative Boltzmann-Shannon entropy by H(u) = − Rd u(x)log u(x)dx. Along the heat equation, the following dissipation relation holds: d dt H(u(t, ·)) = Rd ∥∇x log u(t, x)∥2u(t, x)dx = I(u(t, ·)), where I(u) is named the Fisher information functional. There is a formulation behind this relation, namely ▶ The heat flow is a gradient descent flow of entropy in optimal transport metric. 8
  5. Entropy dissipation=Lyapunov methods The gradient flow of the negative entropy

    −H(u) = Rd u(x)log u(x)dx, w.r.t. optimal transport metric distance satisfies ∂u ∂t = ∇ · (u∇log u) = ∆u. Here the major trick is that u∇ log u = ∇u. In this way, one can study the entropy dissipation by − d dt H(u) = − Rd log u∇ · (u∇log u)dx = Rd ∥∇ log u∥2udx. 9
  6. Behind Lyapunov methods: Optimal transport What is the optimal way

    to move or transport the mountain with shape X, density u0(x) to another shape Y with density u1(y)? 10
  7. Literature The optimal transport problem was first introduced by Monge

    in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. ▶ Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; ▶ Lyapunov methods and Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; ▶ Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Lions, Georgiou, Nelson, Lafferty, et.al. ▶ Numerical OT and MFG: Benamou, Nurbekyan, Oberman, Osher, Achdou, Lai, et.al. 11
  8. Lyapunov methods induced calculus Informally speaking, Wasserstein metric refers to

    the following bilinear form: ⟨ ˙ u1 , G(u) ˙ u2 ⟩ = ( ˙ u1 , (−∆u )−1 ˙ u2 )dx. In other words, denote ˙ ui = −∇ · (u∇ϕi ), i = 1, 2, then ⟨ϕ1 , G(u)−1ϕ2 ⟩ = ⟨ϕ1 , −∇ · (u∇)ϕ2 ⟩, where u ∈ P(Ω), ˙ ui is the tangent vector in P(Ω) with ˙ ui dx = 0, and ϕi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point u. Here ∇·, ∇ are standard divergence and gradient operators in Ω. 12
  9. Gradient flows The Wasserstein gradient flow of an energy functional

    F(u) leads to ∂t u = − G(u)−1 δ δu F(u) =∇ · (u∇ δ δu F(u)). ▶ If F(u) = F(x)u(x)dx, then ∂t u = ∇ · (u∇F(x)). ▶ If F(u) = u(x) log u(x)dx, then ∂t u = ∇ · (u∇ log u) = ∆u. 13
  10. Hamiltonian flows Consider the Lagrangian functional by L(u, ∂t u)

    = 1 2 ∂t u, (−∇ · (u∇))−1∂t u dx − F(u). By the Legendre transform, H(u, ϕ) = sup ∂tu ∂t uϕdx − L(u, ∂t u). And the Hamiltonian system follows ∂t u = δ δϕ H(u, ϕ), ∂t ϕ = − δ δu H(u, ϕ), where δ δu , δ δϕ are L2 first variation operators w.r.t. u, ϕ, respectively and the density Hamiltonian forms H(u, ϕ) = 1 2 ∥∇ϕ∥2udx + F(u). Here u is the “density” state variable and ϕ is the “density” moment variable. 14
  11. Examples: potential mean field games In details, the Hamiltonian flows

    satisfy    ∂t u + ∇ · (u∇ϕ) = 0 ∂t ϕ + 1 2 ∥∇ϕ∥2 = − δ δu F(u). This is a well known dynamic, studied by potential mean field games, optimal transport and compressible Euler equations. Nowadays, we apply these Lagrangian or Hamiltonian formulism to design variational schemes in neural networks, finite difference/volume, and kernelized spaces. 15
  12. Lyapunov methods Consider a Lyapunov functional G(u) = G(u(x))dx, where

    G: R → R is a given convex function with G′′(u) > 0. In fact, how to choose entropy G can be given by entropy-entropy flux-metric condition. We will work on it in the next talk. 17
  13. Metrics Along the reaction diffusion, observe that d dt G(u(t,

    ·)) = G′(u) · ∂t udx = G′(∆F + R)dx = − ∇G′, ∇F dx + G′Rdx = − ∇G′, ∇u F′dx + G′Rdx = − ∇G′, ∇u G′′ F′(u) G′′ dx + G′2 R G′ dx = − ∇G′, ∇G′ F′ G′′ dx + G′(u)2 R G′ dx, where we apply integration by parts in the third equality and ∇G′(u) = G′′(u)∇u in the last equality. 18
  14. Lyapunov methods induced metrics I Denote the smooth positive probability

    space by M = u ∈ C∞(Ω): u > 0 . Given F, G: R → R satisfying F′(u) > 0 if u > 0, and G′′(u) > 0. Denote V1 (u) = F′(u) G′′(u) , V2 (u) = − R(u) G′(u) . Denote the tangent space of M at u ∈ M by Tu M = σ ∈ C∞(Ω) . 19
  15. Lyapunov methods induced metrics II Definition (Mean-field information metric) The

    inner product g(u): Tu M × Tu M → R is given below. For any σ1 , σ2 ∈ Tu M, define g(u)(σ1 , σ2 ) = Ω σ1 − ∇ · (V1 (u)∇) + V2 (u) −1 σ2 dx = Ω (∇Φ1 , ∇Φ2 )V1 (u)dx + Ω Φ1 Φ2 V2 (u)dx, where σi = −∇ · (V1 (u)∇Φi ) + V2 (u)Φi , i = 1, 2. 20
  16. Reaction-diffusions in metric spaces Given an energy functional F :

    M → R, the gradient flow of F in (M(Ω), g) satisfies ∂t u(t, x) = ∇ · (V1 (u)∇ δ δu F(u))(t, x) − V2 (u) δ δu F(u)(t, x). In particular, if F(u) = G(u) = G(u)dx, then ∂t u =∇ · (V1 (u)∇G′(u)) − V2 (u)G′(u) =∇ · ( F′(u) G′′(u) G′′(u)∇u) + R(u) G′(u) G′(u) =∇ · (F′(u)∇u) + R(u) =∆F(u) + R(u). 21
  17. Controlling reaction diffusions Denote a given energy functional by F

    : M(Ω) → R. Consider a variational problem by inf v1,v2,u 1 0 Ω 1 2 ∥v1 ∥2V1 (u) + 1 2 |v2 |2V2 (u)dx − F(u) dt where the infimum is taken among all density functions u: [0, 1] × Ω → R, vector fields v1 : [0, 1] × Ω → Rd, and reaction rate functions v2 : [0, 1] × Ω → R, such that ∂t u(t, x) + ∇ · (V1 (u(t, x))v1 (t, x)) = v2 (t, x)V2 (u(t, x)), with fixed initial and terminal density functions u0 , u1 ∈ M(Ω). 23
  18. Mean field information dynamics The critical point of variational problem

    satisfies v1 (t, x) = ∇Φ(t, x), v2 (t, x) = Φ(t, x), with            ∂t u(t, x) + ∇ · (V1 (u(t, x))∇Φ(t, x)) = V2 (u(t, x))Φ(t, x) ∂t Φ(t, x) + 1 2 ∥∇Φ(t, x)∥2V ′ 1 (u(t, x)) + 1 2 |Φ(t, x)|2V ′ 2 (u(t, x)) + δ δu F(u)(t, x) = 0. 24
  19. Mean field information Hamiltonian There exists a Hamiltonian formulations for

    the control problem.      ∂t u = δ δΦ H(u, Φ), ∂t Φ = − δ δu H(u, Φ), where H is the Hamiltonian functional defined by H(u, Φ) = 1 2 Ω ∥∇Φ∥2V1 (u)dx + 1 2 Ω |Φ|2V2 (u)dx + F(u). 25
  20. Functional Hamilton-Jacobi equations The Hamilton-Jacobi equation in positive density space

    satisfies ∂t U(t, u) + 1 2 Ω ∥∇ δ δu(x) U(t, u)∥2V1 (u)dx + 1 2 Ω | δ δu(x) U(t, u)|2V2 (u)dx + F(u) = 0, where U : [0, 1] × L2(Ω) → R is a value functional. 26
  21. Example I Example (Wasserstein metric, heat flow and density Hamilton

    flows) Let G(u) = u log u − u, F(u) = u, R(u) = 0, thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = 0. Then the mean-field information metric forms g(u)(σ1 , σ2 ) = Ω (∇Φ1 (x), ∇Φ2 (x))u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u + ∇ · (u∇Φ) = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + δ δu F(u) = 0. 27
  22. Example II Example (Fisher-Rao metric, Birth-death equation and density Hamiltonian

    flows) Consider V1 (u) = F′(u) G′′(u) = 0, V2 (u) = − R(u) G′(u) = u. Then the mean-field information metric satisfies g(u)(σ1 , σ2 ) = Ω Φ1 (x)Φ2 (x)u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u − uΦ = 0 ∂t Φ + 1 2 |Φ|2 + δ δu F(u) = 0. 28
  23. Example III Example (Fisher-KPP metric, Fisher-KPP equation and density Hamiltonian

    flows) Consider F(u) = u, G(u) = u log u − u, R(u) = u(1 − u). Thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = u(u − 1) log u . Here the Hamiltonian flow in (M(Ω), g) satisfies        ∂t u + ∇ · (u∇Φ) − u(u − 1) log u Φ = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + |Φ|2 (2u − 1) log u + 1 − u (log u)2 + δ δu F(u) = 0. 29
  24. The Primal-Dual Algorithms Consider the following convex optimization problem. min

    z f(Az) + g(z), where z is a variable to be minimized, f and g are convex functions and A is a continuous linear operator. The algorithm solves the problem by iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 L2 ¯ p = 2p(k+1) − p(k), (1) where p is a dual variable, L is a Lagrangian functional L(z, p) = g(z) + ⟨Az, p⟩ − f∗(p) and f∗(p) = sup z ⟨z, p⟩ − f(z) is the Legendre transform of f. 30
  25. Primal-dual methods The scheme converges if the step sizes τ

    and σ satisfy τσ∥AT A∥L2 < 1, where ∥ · ∥L2 is an operator norm in L2. These step sizes are dependent on the operator A. In this paper, we use G-prox PDHG, a variation of the primal-dual algorithm. G-Prox PDHG provides an appropriate choice of norms for the algorithm that allows the algorithm to converge faster than the vanilla PDHG algorithm. 31
  26. G-prox Primal-dual methods G-prox PDHG finds the saddle point by

    iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 H ¯ p = 2p(k+1) − p(k), where the norm ∥ · ∥H is defined as ∥z∥2 H = ∥AT z∥2 L2 . By choosing this proper norm, the step size only needs to satisfy στ < 1, which is independent of an operator A. 32
  27. Implementation of the algorithm We set the linear operator A

    as A(m1 , m2 , u) = ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x). We set convex functionals f and g below g(m1 , m2 , u) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) dx − F(u) dt f (A(m1 , m2 , u)) = 0 if A(m1 , m2 , u) = 0 ∞ otherwise f∗(Φ) = 0. Thus, we have the Lagrangian functional L(m1 , m2 , u, Φ) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) + Φ(t, x) ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x) dx − F(u) dt. 33
  28. Implementation of the algorithm We find the saddle point (m∗

    1 , m∗ 2 , u∗, Φ∗) by iterating the following schemes: u(k+1) = argmin u L(m(k) 1 , m(k) 2 , u, ¯ Φ) + 1 2τ ∥u − u(k)∥2 L2 m(k+1) 1 = argmin m L(m, m(k) 2 , u(k), ¯ Φ) + 1 2τ ∥m − m(k) 1 ∥2 L2 m(k+1) 2 = argmin m L(m(k) 1 , m, u(k), ¯ Φ) + 1 2τ ∥m − m(k) 2 ∥2 L2 Φ(k+1) = argmax Φ L(m(k+1) 1 , m(k+1) 2 , u(k+1), Φ) − 1 2σ ∥Φ − Φ(k)∥2 H ¯ Φ = 2Φ(k+1) − Φ(k). Here, L2 and H norms are defined as ∥u∥2 L2 = (u, u)L2 = 1 0 Ω ∥u∥2dx dt, ∥u∥2 H = ∥AT u∥2 L2 for any u : [0, 1] × Ω → Rn. 34
  29. Implementation of the algorithm We can find the explicit formula

    of u(k+1), m(k+1) 1 , m(k+1) 2 , Φ(k+1) using the first-order optimality conditions. The optimality conditions of m(k+1) 1 , m(k+1) 2 , Φ(k+1) result in first-order equations which can be solved easily. Thus, we have m(k+1) 1 = V1 (u(k)) τ + V1 (u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2 (u(k)) τ + V2 (u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂t u(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 35
  30. Implementation of the algorithm The optimality condition of u(k+1) involves

    the nonlinear equation, which we compute using Newton’s method. For j = 1, 2, · · · a = − ∥m1∥2 2V 2 1 (uj ) V ′ 1 (uj) − |m2|2 2V 2 2 (uj ) V ′ 2 (uj) − ∂tΦ + c log uj + 1 τ (uj − u(k)) b = ∥m1∥2 V 3 1 (uj ) (V ′ 1 (uj))2 − ∥m1∥2 2V 2 1 (uj ) V ′′ 1 (uj) + |m2|2 V 3 2 (uj ) (V ′ 2 (uj))2 − |m2|2 2V 2 2 (uj ) V ′′ 2 (uj) + c uj + 1 τ uj+1 = max(0, uj − a b ) 36
  31. Algorithm Algorithm: G-prox PDHG for mean-field information variational problem Input:

    Initial density u0 and terminal density u1. Output: u, Φ, m2 : [0, T] × Ω → R, m1 : [0, T] × Ω → Rd. For k = 1, 2, · · · u(k+1) ← Compute u(k+1) using Newton’s method. m(k+1) 1 = V1(u(k)) τ+V1(u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2(u(k)) τ+V2(u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂tu(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 Φ(k+1) = 2Φ(k+ 1 2 ) − Φ(k) 37
  32. Numerical Experiments We present two sets of numerical experiments using

    the Algorithm with different V1 , V2 functions. We wrote C++ codes to run the numerical experiments. For all the experiments we used the discretization sizes Nx1 = Nx2 = 128, Nt = 30 38
  33. Example 1 In this example, we use V1 (u) =

    u, V2 (u) = uα, α > 0 We show how varying the exponent α affects the evolution of the densities. Below are the initial and terminal densities used in the simulations. u0 (0, x) = 10 exp −60 (x − 0.3)2 + (y − 0.7)2 + 1 u1 (0, x) = 20 exp −60 (x − 0.7)2 + (y − 0.3)2 + 1. 39
  34. Figure: Example 1: Snapshots of the densities at different times

    t. The first, the second, the third and the fourth rows represent V2 (u) = u, V2 (u) = u2, V2 (u) = u3, V2 (u) = u4, respectively. 40
  35. Figure: Example 1: Cross sections along a line x +

    y = 1 for different time t. Each color represents V2 (u) = u, V2 (u) = u2, V2 (u) = u4. 41
  36. Figure: Example 1: The convergence plot from the Algorithm. The

    x-axis represents iterations and y-axis represents the energy defined in (23). 42
  37. Example 2 This example uses V1 (u) = u, V2

    (u) = u(u − 1) log u which is from the Fisher-KPP equation. The example compares the following three sets of functions. V1 (u) V2 (u) 1 u u(u − 1)/ log u 2 √ u u(u − 1)/ log u 3 √ u u Below are the initial and terminal densities used in the simulations. u0 (0, x) = 15 exp −80 (x − 0.5)2 + (y − 0.5)2 + 1 u1 (0, x) = 15 exp −80 (x − 0.3)2 + (y − 0.3)2 + 15 exp −80 (x − 0.7)2 + (y − 0.7)2 + 1. 43
  38. Figure: Example 2: Snapshots of the densities at different times

    t. The first row is from (V1 (u) = u, V2 (u) = u(u−1) log u ), the second row is from (V1 (u) = √ u, V2 (u) = u(u−1) log u ) and the last row is from (V1 (u) = √ u, V2 (u) = u). 44
  39. Figure: Example 2: The convergence plot from the Algorithm. The

    x-axis represents iterations and y-axis represents the energy defined in the preceding slide. The blue line represents the solution of V1 (u) = u and V2 (u) = u(u−1) log u and converges to 0.00029. The orange line represents the solution of V1 (u) = √ u and V2 (u) = u(u−1) log u and converges to 0.0135. The green line represents the solution of V1 (u) = √ u and V2 (u) = u and converges to 0.0183. 45