Mean field information dynamics via reaction diffusions

Slide 1

Slide 1 text

Mean field information dynamics via reaction diffusions Wuchen Li University of South Carolina Hamilton-Jacobi equations reunion 2022, IPAM A joint work with Wonjun Lee (UCLA) and Stanley Osher (UCLA). 1

Slide 2

Slide 2 text

Pandemic evolution 2

Slide 3

Slide 3 text

Vaccine distribution 3

Slide 4

Slide 4 text

Generalized unnormalized OT 4

Slide 5

Slide 5 text

Reaction diffusion Consider a scalar nonlinear reaction diffusion equation by ∂t u(t, x) = ∆F(u(t, x)) + R(u(t, x)), where x ∈ Ω, Ω ⊂ Rd is a compact convex spatial domain, u ∈ M(Ω) = {u ∈ C∞(Ω): u ≥ 0}, and ∆ is the Euclidean Laplacian operator. 5

Slide 6

Slide 6 text

Reaction diffusions and generalized optimal transport Based on thanksgiving holiday studies 2019, we are working on reaction diffusions and mean field control problems. In this talk, we first review reaction diffusions with optimal transport type metric space. Using this connection, we further design mean field control problems for reaction-diffusion. This also provides variational schemes for solving classical reaction-diffusion in both time and spatial domains. 6

Slide 7

Slide 7 text

Entropy, Information, transportation 7

Slide 8

Slide 8 text

Diffusions and Entropy dissipation Consider a heat equation in Rd by ∂u(t, x) ∂t = ∇ · (∇u(t, x)) = ∆u(t, x). Consider the negative Boltzmann-Shannon entropy by H(u) = − Rd u(x)log u(x)dx. Along the heat equation, the following dissipation relation holds: d dt H(u(t, ·)) = Rd ∥∇x log u(t, x)∥2u(t, x)dx = I(u(t, ·)), where I(u) is named the Fisher information functional. There is a formulation behind this relation, namely ▶ The heat flow is a gradient descent flow of entropy in optimal transport metric. 8

Slide 9

Slide 9 text

Entropy dissipation=Lyapunov methods The gradient flow of the negative entropy −H(u) = Rd u(x)log u(x)dx, w.r.t. optimal transport metric distance satisfies ∂u ∂t = ∇ · (u∇log u) = ∆u. Here the major trick is that u∇ log u = ∇u. In this way, one can study the entropy dissipation by − d dt H(u) = − Rd log u∇ · (u∇log u)dx = Rd ∥∇ log u∥2udx. 9

Slide 10

Slide 10 text

Behind Lyapunov methods: Optimal transport What is the optimal way to move or transport the mountain with shape X, density u0(x) to another shape Y with density u1(y)? 10

Slide 11

Slide 11 text

Literature The optimal transport problem was first introduced by Monge in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. ▶ Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; ▶ Lyapunov methods and Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; ▶ Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Lions, Georgiou, Nelson, Lafferty, et.al. ▶ Numerical OT and MFG: Benamou, Nurbekyan, Oberman, Osher, Achdou, Lai, et.al. 11

Slide 12

Slide 12 text

Lyapunov methods induced calculus Informally speaking, Wasserstein metric refers to the following bilinear form: ⟨ ˙ u1 , G(u) ˙ u2 ⟩ = ( ˙ u1 , (−∆u )−1 ˙ u2 )dx. In other words, denote ˙ ui = −∇ · (u∇ϕi ), i = 1, 2, then ⟨ϕ1 , G(u)−1ϕ2 ⟩ = ⟨ϕ1 , −∇ · (u∇)ϕ2 ⟩, where u ∈ P(Ω), ˙ ui is the tangent vector in P(Ω) with ˙ ui dx = 0, and ϕi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point u. Here ∇·, ∇ are standard divergence and gradient operators in Ω. 12

Slide 13

Slide 13 text

Gradient flows The Wasserstein gradient flow of an energy functional F(u) leads to ∂t u = − G(u)−1 δ δu F(u) =∇ · (u∇ δ δu F(u)). ▶ If F(u) = F(x)u(x)dx, then ∂t u = ∇ · (u∇F(x)). ▶ If F(u) = u(x) log u(x)dx, then ∂t u = ∇ · (u∇ log u) = ∆u. 13

Slide 14

Slide 14 text

Hamiltonian flows Consider the Lagrangian functional by L(u, ∂t u) = 1 2 ∂t u, (−∇ · (u∇))−1∂t u dx − F(u). By the Legendre transform, H(u, ϕ) = sup ∂tu ∂t uϕdx − L(u, ∂t u). And the Hamiltonian system follows ∂t u = δ δϕ H(u, ϕ), ∂t ϕ = − δ δu H(u, ϕ), where δ δu , δ δϕ are L2 first variation operators w.r.t. u, ϕ, respectively and the density Hamiltonian forms H(u, ϕ) = 1 2 ∥∇ϕ∥2udx + F(u). Here u is the “density” state variable and ϕ is the “density” moment variable. 14

Slide 15

Slide 15 text

Examples: potential mean field games In details, the Hamiltonian flows satisfy    ∂t u + ∇ · (u∇ϕ) = 0 ∂t ϕ + 1 2 ∥∇ϕ∥2 = − δ δu F(u). This is a well known dynamic, studied by potential mean field games, optimal transport and compressible Euler equations. Nowadays, we apply these Lagrangian or Hamiltonian formulism to design variational schemes in neural networks, finite difference/volume, and kernelized spaces. 15

Slide 16

Slide 16 text

Variational methods for reaction diffusions? 16

Slide 17

Slide 17 text

Lyapunov methods Consider a Lyapunov functional G(u) = G(u(x))dx, where G: R → R is a given convex function with G′′(u) > 0. In fact, how to choose entropy G can be given by entropy-entropy flux-metric condition. We will work on it in the next talk. 17

Slide 18

Slide 18 text

Metrics Along the reaction diffusion, observe that d dt G(u(t, ·)) = G′(u) · ∂t udx = G′(∆F + R)dx = − ∇G′, ∇F dx + G′Rdx = − ∇G′, ∇u F′dx + G′Rdx = − ∇G′, ∇u G′′ F′(u) G′′ dx + G′2 R G′ dx = − ∇G′, ∇G′ F′ G′′ dx + G′(u)2 R G′ dx, where we apply integration by parts in the third equality and ∇G′(u) = G′′(u)∇u in the last equality. 18

Slide 19

Slide 19 text

Lyapunov methods induced metrics I Denote the smooth positive probability space by M = u ∈ C∞(Ω): u > 0 . Given F, G: R → R satisfying F′(u) > 0 if u > 0, and G′′(u) > 0. Denote V1 (u) = F′(u) G′′(u) , V2 (u) = − R(u) G′(u) . Denote the tangent space of M at u ∈ M by Tu M = σ ∈ C∞(Ω) . 19

Slide 20

Slide 20 text

Lyapunov methods induced metrics II Definition (Mean-field information metric) The inner product g(u): Tu M × Tu M → R is given below. For any σ1 , σ2 ∈ Tu M, define g(u)(σ1 , σ2 ) = Ω σ1 − ∇ · (V1 (u)∇) + V2 (u) −1 σ2 dx = Ω (∇Φ1 , ∇Φ2 )V1 (u)dx + Ω Φ1 Φ2 V2 (u)dx, where σi = −∇ · (V1 (u)∇Φi ) + V2 (u)Φi , i = 1, 2. 20

Slide 21

Slide 21 text

Reaction-diffusions in metric spaces Given an energy functional F : M → R, the gradient flow of F in (M(Ω), g) satisfies ∂t u(t, x) = ∇ · (V1 (u)∇ δ δu F(u))(t, x) − V2 (u) δ δu F(u)(t, x). In particular, if F(u) = G(u) = G(u)dx, then ∂t u =∇ · (V1 (u)∇G′(u)) − V2 (u)G′(u) =∇ · ( F′(u) G′′(u) G′′(u)∇u) + R(u) G′(u) G′(u) =∇ · (F′(u)∇u) + R(u) =∆F(u) + R(u). 21

Slide 22

Slide 22 text

Controlling reaction diffusions 22

Slide 23

Slide 23 text

Controlling reaction diffusions Denote a given energy functional by F : M(Ω) → R. Consider a variational problem by inf v1,v2,u 1 0 Ω 1 2 ∥v1 ∥2V1 (u) + 1 2 |v2 |2V2 (u)dx − F(u) dt where the infimum is taken among all density functions u: [0, 1] × Ω → R, vector fields v1 : [0, 1] × Ω → Rd, and reaction rate functions v2 : [0, 1] × Ω → R, such that ∂t u(t, x) + ∇ · (V1 (u(t, x))v1 (t, x)) = v2 (t, x)V2 (u(t, x)), with fixed initial and terminal density functions u0 , u1 ∈ M(Ω). 23

Slide 24

Slide 24 text

Mean field information dynamics The critical point of variational problem satisfies v1 (t, x) = ∇Φ(t, x), v2 (t, x) = Φ(t, x), with            ∂t u(t, x) + ∇ · (V1 (u(t, x))∇Φ(t, x)) = V2 (u(t, x))Φ(t, x) ∂t Φ(t, x) + 1 2 ∥∇Φ(t, x)∥2V ′ 1 (u(t, x)) + 1 2 |Φ(t, x)|2V ′ 2 (u(t, x)) + δ δu F(u)(t, x) = 0. 24

Slide 25

Slide 25 text

Mean field information Hamiltonian There exists a Hamiltonian formulations for the control problem.      ∂t u = δ δΦ H(u, Φ), ∂t Φ = − δ δu H(u, Φ), where H is the Hamiltonian functional defined by H(u, Φ) = 1 2 Ω ∥∇Φ∥2V1 (u)dx + 1 2 Ω |Φ|2V2 (u)dx + F(u). 25

Slide 26

Slide 26 text

Functional Hamilton-Jacobi equations The Hamilton-Jacobi equation in positive density space satisfies ∂t U(t, u) + 1 2 Ω ∥∇ δ δu(x) U(t, u)∥2V1 (u)dx + 1 2 Ω | δ δu(x) U(t, u)|2V2 (u)dx + F(u) = 0, where U : [0, 1] × L2(Ω) → R is a value functional. 26

Slide 27

Slide 27 text

Example I Example (Wasserstein metric, heat flow and density Hamilton flows) Let G(u) = u log u − u, F(u) = u, R(u) = 0, thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = 0. Then the mean-field information metric forms g(u)(σ1 , σ2 ) = Ω (∇Φ1 (x), ∇Φ2 (x))u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u + ∇ · (u∇Φ) = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + δ δu F(u) = 0. 27

Slide 28

Slide 28 text

Example II Example (Fisher-Rao metric, Birth-death equation and density Hamiltonian flows) Consider V1 (u) = F′(u) G′′(u) = 0, V2 (u) = − R(u) G′(u) = u. Then the mean-field information metric satisfies g(u)(σ1 , σ2 ) = Ω Φ1 (x)Φ2 (x)u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u − uΦ = 0 ∂t Φ + 1 2 |Φ|2 + δ δu F(u) = 0. 28

Slide 29

Slide 29 text

Example III Example (Fisher-KPP metric, Fisher-KPP equation and density Hamiltonian flows) Consider F(u) = u, G(u) = u log u − u, R(u) = u(1 − u). Thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = u(u − 1) log u . Here the Hamiltonian flow in (M(Ω), g) satisfies        ∂t u + ∇ · (u∇Φ) − u(u − 1) log u Φ = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + |Φ|2 (2u − 1) log u + 1 − u (log u)2 + δ δu F(u) = 0. 29

Slide 30

Slide 30 text

The Primal-Dual Algorithms Consider the following convex optimization problem. min z f(Az) + g(z), where z is a variable to be minimized, f and g are convex functions and A is a continuous linear operator. The algorithm solves the problem by iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 L2 ¯ p = 2p(k+1) − p(k), (1) where p is a dual variable, L is a Lagrangian functional L(z, p) = g(z) + ⟨Az, p⟩ − f∗(p) and f∗(p) = sup z ⟨z, p⟩ − f(z) is the Legendre transform of f. 30

Slide 31

Slide 31 text

Primal-dual methods The scheme converges if the step sizes τ and σ satisfy τσ∥AT A∥L2 < 1, where ∥ · ∥L2 is an operator norm in L2. These step sizes are dependent on the operator A. In this paper, we use G-prox PDHG, a variation of the primal-dual algorithm. G-Prox PDHG provides an appropriate choice of norms for the algorithm that allows the algorithm to converge faster than the vanilla PDHG algorithm. 31

Slide 32

Slide 32 text

G-prox Primal-dual methods G-prox PDHG finds the saddle point by iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 H ¯ p = 2p(k+1) − p(k), where the norm ∥ · ∥H is defined as ∥z∥2 H = ∥AT z∥2 L2 . By choosing this proper norm, the step size only needs to satisfy στ < 1, which is independent of an operator A. 32

Slide 33

Slide 33 text

Implementation of the algorithm We set the linear operator A as A(m1 , m2 , u) = ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x). We set convex functionals f and g below g(m1 , m2 , u) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) dx − F(u) dt f (A(m1 , m2 , u)) = 0 if A(m1 , m2 , u) = 0 ∞ otherwise f∗(Φ) = 0. Thus, we have the Lagrangian functional L(m1 , m2 , u, Φ) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) + Φ(t, x) ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x) dx − F(u) dt. 33

Slide 34

Slide 34 text

Implementation of the algorithm We find the saddle point (m∗ 1 , m∗ 2 , u∗, Φ∗) by iterating the following schemes: u(k+1) = argmin u L(m(k) 1 , m(k) 2 , u, ¯ Φ) + 1 2τ ∥u − u(k)∥2 L2 m(k+1) 1 = argmin m L(m, m(k) 2 , u(k), ¯ Φ) + 1 2τ ∥m − m(k) 1 ∥2 L2 m(k+1) 2 = argmin m L(m(k) 1 , m, u(k), ¯ Φ) + 1 2τ ∥m − m(k) 2 ∥2 L2 Φ(k+1) = argmax Φ L(m(k+1) 1 , m(k+1) 2 , u(k+1), Φ) − 1 2σ ∥Φ − Φ(k)∥2 H ¯ Φ = 2Φ(k+1) − Φ(k). Here, L2 and H norms are defined as ∥u∥2 L2 = (u, u)L2 = 1 0 Ω ∥u∥2dx dt, ∥u∥2 H = ∥AT u∥2 L2 for any u : [0, 1] × Ω → Rn. 34

Slide 35

Slide 35 text

Implementation of the algorithm We can find the explicit formula of u(k+1), m(k+1) 1 , m(k+1) 2 , Φ(k+1) using the first-order optimality conditions. The optimality conditions of m(k+1) 1 , m(k+1) 2 , Φ(k+1) result in first-order equations which can be solved easily. Thus, we have m(k+1) 1 = V1 (u(k)) τ + V1 (u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2 (u(k)) τ + V2 (u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂t u(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 35

Slide 36

Slide 36 text

Implementation of the algorithm The optimality condition of u(k+1) involves the nonlinear equation, which we compute using Newton’s method. For j = 1, 2, · · · a = − ∥m1∥2 2V 2 1 (uj ) V ′ 1 (uj) − |m2|2 2V 2 2 (uj ) V ′ 2 (uj) − ∂tΦ + c log uj + 1 τ (uj − u(k)) b = ∥m1∥2 V 3 1 (uj ) (V ′ 1 (uj))2 − ∥m1∥2 2V 2 1 (uj ) V ′′ 1 (uj) + |m2|2 V 3 2 (uj ) (V ′ 2 (uj))2 − |m2|2 2V 2 2 (uj ) V ′′ 2 (uj) + c uj + 1 τ uj+1 = max(0, uj − a b ) 36

Slide 37

Slide 37 text

Algorithm Algorithm: G-prox PDHG for mean-field information variational problem Input: Initial density u0 and terminal density u1. Output: u, Φ, m2 : [0, T] × Ω → R, m1 : [0, T] × Ω → Rd. For k = 1, 2, · · · u(k+1) ← Compute u(k+1) using Newton’s method. m(k+1) 1 = V1(u(k)) τ+V1(u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2(u(k)) τ+V2(u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂tu(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 Φ(k+1) = 2Φ(k+ 1 2 ) − Φ(k) 37

Slide 38

Slide 38 text

Numerical Experiments We present two sets of numerical experiments using the Algorithm with different V1 , V2 functions. We wrote C++ codes to run the numerical experiments. For all the experiments we used the discretization sizes Nx1 = Nx2 = 128, Nt = 30 38

Slide 39

Slide 39 text

Example 1 In this example, we use V1 (u) = u, V2 (u) = uα, α > 0 We show how varying the exponent α affects the evolution of the densities. Below are the initial and terminal densities used in the simulations. u0 (0, x) = 10 exp −60 (x − 0.3)2 + (y − 0.7)2 + 1 u1 (0, x) = 20 exp −60 (x − 0.7)2 + (y − 0.3)2 + 1. 39

Slide 40

Slide 40 text

Figure: Example 1: Snapshots of the densities at different times t. The first, the second, the third and the fourth rows represent V2 (u) = u, V2 (u) = u2, V2 (u) = u3, V2 (u) = u4, respectively. 40

Slide 41

Slide 41 text

Figure: Example 1: Cross sections along a line x + y = 1 for different time t. Each color represents V2 (u) = u, V2 (u) = u2, V2 (u) = u4. 41

Slide 42

Slide 42 text

Figure: Example 1: The convergence plot from the Algorithm. The x-axis represents iterations and y-axis represents the energy defined in (23). 42

Slide 43

Slide 43 text

Example 2 This example uses V1 (u) = u, V2 (u) = u(u − 1) log u which is from the Fisher-KPP equation. The example compares the following three sets of functions. V1 (u) V2 (u) 1 u u(u − 1)/ log u 2 √ u u(u − 1)/ log u 3 √ u u Below are the initial and terminal densities used in the simulations. u0 (0, x) = 15 exp −80 (x − 0.5)2 + (y − 0.5)2 + 1 u1 (0, x) = 15 exp −80 (x − 0.3)2 + (y − 0.3)2 + 15 exp −80 (x − 0.7)2 + (y − 0.7)2 + 1. 43

Slide 44

Slide 44 text

Figure: Example 2: Snapshots of the densities at different times t. The first row is from (V1 (u) = u, V2 (u) = u(u−1) log u ), the second row is from (V1 (u) = √ u, V2 (u) = u(u−1) log u ) and the last row is from (V1 (u) = √ u, V2 (u) = u). 44

Slide 45

Slide 45 text

Figure: Example 2: The convergence plot from the Algorithm. The x-axis represents iterations and y-axis represents the energy defined in the preceding slide. The blue line represents the solution of V1 (u) = u and V2 (u) = u(u−1) log u and converges to 0.00029. The orange line represents the solution of V1 (u) = √ u and V2 (u) = u(u−1) log u and converges to 0.0135. The green line represents the solution of V1 (u) = √ u and V2 (u) = u and converges to 0.0183. 45

Slide 46

Slide 46 text

Discussions: Mean field Information Geometry 46