Mean field information dynamics via reaction diffusions

Mean field information dynamics via reaction diffusions Wuchen Li University
of South Carolina Hamilton-Jacobi equations reunion 2022, IPAM A joint work with Wonjun Lee (UCLA) and Stanley Osher (UCLA). 1

Pandemic evolution 2

Vaccine distribution 3

Generalized unnormalized OT 4

Reaction diffusion Consider a scalar nonlinear reaction diffusion equation by
∂t u(t, x) = ∆F(u(t, x)) + R(u(t, x)), where x ∈ Ω, Ω ⊂ Rd is a compact convex spatial domain, u ∈ M(Ω) = {u ∈ C∞(Ω): u ≥ 0}, and ∆ is the Euclidean Laplacian operator. 5

Reaction diffusions and generalized optimal transport Based on thanksgiving holiday
studies 2019, we are working on reaction diffusions and mean field control problems. In this talk, we first review reaction diffusions with optimal transport type metric space. Using this connection, we further design mean field control problems for reaction-diffusion. This also provides variational schemes for solving classical reaction-diffusion in both time and spatial domains. 6

Entropy, Information, transportation 7

Diffusions and Entropy dissipation Consider a heat equation in Rd
by ∂u(t, x) ∂t = ∇ · (∇u(t, x)) = ∆u(t, x). Consider the negative Boltzmann-Shannon entropy by H(u) = − Rd u(x)log u(x)dx. Along the heat equation, the following dissipation relation holds: d dt H(u(t, ·)) = Rd ∥∇x log u(t, x)∥2u(t, x)dx = I(u(t, ·)), where I(u) is named the Fisher information functional. There is a formulation behind this relation, namely ▶ The heat flow is a gradient descent flow of entropy in optimal transport metric. 8

Entropy dissipation=Lyapunov methods The gradient flow of the negative entropy
−H(u) = Rd u(x)log u(x)dx, w.r.t. optimal transport metric distance satisfies ∂u ∂t = ∇ · (u∇log u) = ∆u. Here the major trick is that u∇ log u = ∇u. In this way, one can study the entropy dissipation by − d dt H(u) = − Rd log u∇ · (u∇log u)dx = Rd ∥∇ log u∥2udx. 9

Behind Lyapunov methods: Optimal transport What is the optimal way
to move or transport the mountain with shape X, density u0(x) to another shape Y with density u1(y)? 10

Literature The optimal transport problem was first introduced by Monge
in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. ▶ Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; ▶ Lyapunov methods and Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; ▶ Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Lions, Georgiou, Nelson, Lafferty, et.al. ▶ Numerical OT and MFG: Benamou, Nurbekyan, Oberman, Osher, Achdou, Lai, et.al. 11

Lyapunov methods induced calculus Informally speaking, Wasserstein metric refers to
the following bilinear form: ⟨ ˙ u1 , G(u) ˙ u2 ⟩ = ( ˙ u1 , (−∆u )−1 ˙ u2 )dx. In other words, denote ˙ ui = −∇ · (u∇ϕi ), i = 1, 2, then ⟨ϕ1 , G(u)−1ϕ2 ⟩ = ⟨ϕ1 , −∇ · (u∇)ϕ2 ⟩, where u ∈ P(Ω), ˙ ui is the tangent vector in P(Ω) with ˙ ui dx = 0, and ϕi ∈ C∞(Ω) are cotangent vectors in P(Ω) at the point u. Here ∇·, ∇ are standard divergence and gradient operators in Ω. 12

Gradient flows The Wasserstein gradient flow of an energy functional
F(u) leads to ∂t u = − G(u)−1 δ δu F(u) =∇ · (u∇ δ δu F(u)). ▶ If F(u) = F(x)u(x)dx, then ∂t u = ∇ · (u∇F(x)). ▶ If F(u) = u(x) log u(x)dx, then ∂t u = ∇ · (u∇ log u) = ∆u. 13

Hamiltonian flows Consider the Lagrangian functional by L(u, ∂t u)
= 1 2 ∂t u, (−∇ · (u∇))−1∂t u dx − F(u). By the Legendre transform, H(u, ϕ) = sup ∂tu ∂t uϕdx − L(u, ∂t u). And the Hamiltonian system follows ∂t u = δ δϕ H(u, ϕ), ∂t ϕ = − δ δu H(u, ϕ), where δ δu , δ δϕ are L2 first variation operators w.r.t. u, ϕ, respectively and the density Hamiltonian forms H(u, ϕ) = 1 2 ∥∇ϕ∥2udx + F(u). Here u is the “density” state variable and ϕ is the “density” moment variable. 14

Examples: potential mean field games In details, the Hamiltonian flows
satisfy    ∂t u + ∇ · (u∇ϕ) = 0 ∂t ϕ + 1 2 ∥∇ϕ∥2 = − δ δu F(u). This is a well known dynamic, studied by potential mean field games, optimal transport and compressible Euler equations. Nowadays, we apply these Lagrangian or Hamiltonian formulism to design variational schemes in neural networks, finite difference/volume, and kernelized spaces. 15

Variational methods for reaction diffusions? 16

Lyapunov methods Consider a Lyapunov functional G(u) = G(u(x))dx, where
G: R → R is a given convex function with G′′(u) > 0. In fact, how to choose entropy G can be given by entropy-entropy flux-metric condition. We will work on it in the next talk. 17

Metrics Along the reaction diffusion, observe that d dt G(u(t,
·)) = G′(u) · ∂t udx = G′(∆F + R)dx = − ∇G′, ∇F dx + G′Rdx = − ∇G′, ∇u F′dx + G′Rdx = − ∇G′, ∇u G′′ F′(u) G′′ dx + G′2 R G′ dx = − ∇G′, ∇G′ F′ G′′ dx + G′(u)2 R G′ dx, where we apply integration by parts in the third equality and ∇G′(u) = G′′(u)∇u in the last equality. 18

Lyapunov methods induced metrics I Denote the smooth positive probability
space by M = u ∈ C∞(Ω): u > 0 . Given F, G: R → R satisfying F′(u) > 0 if u > 0, and G′′(u) > 0. Denote V1 (u) = F′(u) G′′(u) , V2 (u) = − R(u) G′(u) . Denote the tangent space of M at u ∈ M by Tu M = σ ∈ C∞(Ω) . 19

Lyapunov methods induced metrics II Definition (Mean-field information metric) The
inner product g(u): Tu M × Tu M → R is given below. For any σ1 , σ2 ∈ Tu M, define g(u)(σ1 , σ2 ) = Ω σ1 − ∇ · (V1 (u)∇) + V2 (u) −1 σ2 dx = Ω (∇Φ1 , ∇Φ2 )V1 (u)dx + Ω Φ1 Φ2 V2 (u)dx, where σi = −∇ · (V1 (u)∇Φi ) + V2 (u)Φi , i = 1, 2. 20

Reaction-diffusions in metric spaces Given an energy functional F :
M → R, the gradient flow of F in (M(Ω), g) satisfies ∂t u(t, x) = ∇ · (V1 (u)∇ δ δu F(u))(t, x) − V2 (u) δ δu F(u)(t, x). In particular, if F(u) = G(u) = G(u)dx, then ∂t u =∇ · (V1 (u)∇G′(u)) − V2 (u)G′(u) =∇ · ( F′(u) G′′(u) G′′(u)∇u) + R(u) G′(u) G′(u) =∇ · (F′(u)∇u) + R(u) =∆F(u) + R(u). 21

Controlling reaction diffusions 22

Controlling reaction diffusions Denote a given energy functional by F
: M(Ω) → R. Consider a variational problem by inf v1,v2,u 1 0 Ω 1 2 ∥v1 ∥2V1 (u) + 1 2 |v2 |2V2 (u)dx − F(u) dt where the infimum is taken among all density functions u: [0, 1] × Ω → R, vector fields v1 : [0, 1] × Ω → Rd, and reaction rate functions v2 : [0, 1] × Ω → R, such that ∂t u(t, x) + ∇ · (V1 (u(t, x))v1 (t, x)) = v2 (t, x)V2 (u(t, x)), with fixed initial and terminal density functions u0 , u1 ∈ M(Ω). 23

Mean field information dynamics The critical point of variational problem
satisfies v1 (t, x) = ∇Φ(t, x), v2 (t, x) = Φ(t, x), with            ∂t u(t, x) + ∇ · (V1 (u(t, x))∇Φ(t, x)) = V2 (u(t, x))Φ(t, x) ∂t Φ(t, x) + 1 2 ∥∇Φ(t, x)∥2V ′ 1 (u(t, x)) + 1 2 |Φ(t, x)|2V ′ 2 (u(t, x)) + δ δu F(u)(t, x) = 0. 24

Mean field information Hamiltonian There exists a Hamiltonian formulations for
the control problem.      ∂t u = δ δΦ H(u, Φ), ∂t Φ = − δ δu H(u, Φ), where H is the Hamiltonian functional defined by H(u, Φ) = 1 2 Ω ∥∇Φ∥2V1 (u)dx + 1 2 Ω |Φ|2V2 (u)dx + F(u). 25

Functional Hamilton-Jacobi equations The Hamilton-Jacobi equation in positive density space
satisfies ∂t U(t, u) + 1 2 Ω ∥∇ δ δu(x) U(t, u)∥2V1 (u)dx + 1 2 Ω | δ δu(x) U(t, u)|2V2 (u)dx + F(u) = 0, where U : [0, 1] × L2(Ω) → R is a value functional. 26

Example I Example (Wasserstein metric, heat flow and density Hamilton
flows) Let G(u) = u log u − u, F(u) = u, R(u) = 0, thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = 0. Then the mean-field information metric forms g(u)(σ1 , σ2 ) = Ω (∇Φ1 (x), ∇Φ2 (x))u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u + ∇ · (u∇Φ) = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + δ δu F(u) = 0. 27

Example II Example (Fisher-Rao metric, Birth-death equation and density Hamiltonian
flows) Consider V1 (u) = F′(u) G′′(u) = 0, V2 (u) = − R(u) G′(u) = u. Then the mean-field information metric satisfies g(u)(σ1 , σ2 ) = Ω Φ1 (x)Φ2 (x)u(x)dx. Here the Hamiltonian flow in (M(Ω), g) satisfies    ∂t u − uΦ = 0 ∂t Φ + 1 2 |Φ|2 + δ δu F(u) = 0. 28

Example III Example (Fisher-KPP metric, Fisher-KPP equation and density Hamiltonian
flows) Consider F(u) = u, G(u) = u log u − u, R(u) = u(1 − u). Thus V1 (u) = F′(u) G′′(u) = u, V2 (u) = − R(u) G′(u) = u(u − 1) log u . Here the Hamiltonian flow in (M(Ω), g) satisfies        ∂t u + ∇ · (u∇Φ) − u(u − 1) log u Φ = 0 ∂t Φ + 1 2 ∥∇Φ∥2 + |Φ|2 (2u − 1) log u + 1 − u (log u)2 + δ δu F(u) = 0. 29

The Primal-Dual Algorithms Consider the following convex optimization problem. min
z f(Az) + g(z), where z is a variable to be minimized, f and g are convex functions and A is a continuous linear operator. The algorithm solves the problem by iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 L2 ¯ p = 2p(k+1) − p(k), (1) where p is a dual variable, L is a Lagrangian functional L(z, p) = g(z) + ⟨Az, p⟩ − f∗(p) and f∗(p) = sup z ⟨z, p⟩ − f(z) is the Legendre transform of f. 30

Primal-dual methods The scheme converges if the step sizes τ
and σ satisfy τσ∥AT A∥L2 < 1, where ∥ · ∥L2 is an operator norm in L2. These step sizes are dependent on the operator A. In this paper, we use G-prox PDHG, a variation of the primal-dual algorithm. G-Prox PDHG provides an appropriate choice of norms for the algorithm that allows the algorithm to converge faster than the vanilla PDHG algorithm. 31

G-prox Primal-dual methods G-prox PDHG finds the saddle point by
iterating z(k+1) = argmin z L(z, ¯ p) + 1 2τ ∥z − z(k)∥2 L2 p(k+1) = argmax p L(z(k+1), p) − 1 2σ ∥p − p(k)∥2 H ¯ p = 2p(k+1) − p(k), where the norm ∥ · ∥H is defined as ∥z∥2 H = ∥AT z∥2 L2 . By choosing this proper norm, the step size only needs to satisfy στ < 1, which is independent of an operator A. 32

Implementation of the algorithm We set the linear operator A
as A(m1 , m2 , u) = ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x). We set convex functionals f and g below g(m1 , m2 , u) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) dx − F(u) dt f (A(m1 , m2 , u)) = 0 if A(m1 , m2 , u) = 0 ∞ otherwise f∗(Φ) = 0. Thus, we have the Lagrangian functional L(m1 , m2 , u, Φ) = 1 0 Ω ∥m1 (t, x)∥2 2V1 (u(t, x)) + |m2 (t, x)|2 2V2 (u(t, x)) + Φ(t, x) ∂t u(t, x) + ∇ · m1 (t, x) − m2 (t, x) dx − F(u) dt. 33

Implementation of the algorithm We find the saddle point (m∗
1 , m∗ 2 , u∗, Φ∗) by iterating the following schemes: u(k+1) = argmin u L(m(k) 1 , m(k) 2 , u, ¯ Φ) + 1 2τ ∥u − u(k)∥2 L2 m(k+1) 1 = argmin m L(m, m(k) 2 , u(k), ¯ Φ) + 1 2τ ∥m − m(k) 1 ∥2 L2 m(k+1) 2 = argmin m L(m(k) 1 , m, u(k), ¯ Φ) + 1 2τ ∥m − m(k) 2 ∥2 L2 Φ(k+1) = argmax Φ L(m(k+1) 1 , m(k+1) 2 , u(k+1), Φ) − 1 2σ ∥Φ − Φ(k)∥2 H ¯ Φ = 2Φ(k+1) − Φ(k). Here, L2 and H norms are defined as ∥u∥2 L2 = (u, u)L2 = 1 0 Ω ∥u∥2dx dt, ∥u∥2 H = ∥AT u∥2 L2 for any u : [0, 1] × Ω → Rn. 34

Implementation of the algorithm We can find the explicit formula
of u(k+1), m(k+1) 1 , m(k+1) 2 , Φ(k+1) using the first-order optimality conditions. The optimality conditions of m(k+1) 1 , m(k+1) 2 , Φ(k+1) result in first-order equations which can be solved easily. Thus, we have m(k+1) 1 = V1 (u(k)) τ + V1 (u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2 (u(k)) τ + V2 (u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂t u(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 35

Implementation of the algorithm The optimality condition of u(k+1) involves
the nonlinear equation, which we compute using Newton’s method. For j = 1, 2, · · · a = − ∥m1∥2 2V 2 1 (uj ) V ′ 1 (uj) − |m2|2 2V 2 2 (uj ) V ′ 2 (uj) − ∂tΦ + c log uj + 1 τ (uj − u(k)) b = ∥m1∥2 V 3 1 (uj ) (V ′ 1 (uj))2 − ∥m1∥2 2V 2 1 (uj ) V ′′ 1 (uj) + |m2|2 V 3 2 (uj ) (V ′ 2 (uj))2 − |m2|2 2V 2 2 (uj ) V ′′ 2 (uj) + c uj + 1 τ uj+1 = max(0, uj − a b ) 36

Algorithm Algorithm: G-prox PDHG for mean-field information variational problem Input:
Initial density u0 and terminal density u1. Output: u, Φ, m2 : [0, T] × Ω → R, m1 : [0, T] × Ω → Rd. For k = 1, 2, · · · u(k+1) ← Compute u(k+1) using Newton’s method. m(k+1) 1 = V1(u(k)) τ+V1(u(k)) m(k) 1 + τ∇Φ(k) m(k+1) 2 = V2(u(k)) τ+V2(u(k)) m(k) 2 + τΦ(k) Φ(k+ 1 2 ) = Φ(k) + σ(AAT )−1 ∂tu(k+1) + ∇ · m(k+1) 1 − m(k+1) 2 Φ(k+1) = 2Φ(k+ 1 2 ) − Φ(k) 37

Numerical Experiments We present two sets of numerical experiments using
the Algorithm with different V1 , V2 functions. We wrote C++ codes to run the numerical experiments. For all the experiments we used the discretization sizes Nx1 = Nx2 = 128, Nt = 30 38

Example 1 In this example, we use V1 (u) =
u, V2 (u) = uα, α > 0 We show how varying the exponent α affects the evolution of the densities. Below are the initial and terminal densities used in the simulations. u0 (0, x) = 10 exp −60 (x − 0.3)2 + (y − 0.7)2 + 1 u1 (0, x) = 20 exp −60 (x − 0.7)2 + (y − 0.3)2 + 1. 39

Figure: Example 1: Snapshots of the densities at different times
t. The first, the second, the third and the fourth rows represent V2 (u) = u, V2 (u) = u2, V2 (u) = u3, V2 (u) = u4, respectively. 40

Figure: Example 1: Cross sections along a line x +
y = 1 for different time t. Each color represents V2 (u) = u, V2 (u) = u2, V2 (u) = u4. 41

Figure: Example 1: The convergence plot from the Algorithm. The
x-axis represents iterations and y-axis represents the energy defined in (23). 42

Example 2 This example uses V1 (u) = u, V2
(u) = u(u − 1) log u which is from the Fisher-KPP equation. The example compares the following three sets of functions. V1 (u) V2 (u) 1 u u(u − 1)/ log u 2 √ u u(u − 1)/ log u 3 √ u u Below are the initial and terminal densities used in the simulations. u0 (0, x) = 15 exp −80 (x − 0.5)2 + (y − 0.5)2 + 1 u1 (0, x) = 15 exp −80 (x − 0.3)2 + (y − 0.3)2 + 15 exp −80 (x − 0.7)2 + (y − 0.7)2 + 1. 43

Figure: Example 2: Snapshots of the densities at different times
t. The first row is from (V1 (u) = u, V2 (u) = u(u−1) log u ), the second row is from (V1 (u) = √ u, V2 (u) = u(u−1) log u ) and the last row is from (V1 (u) = √ u, V2 (u) = u). 44

Figure: Example 2: The convergence plot from the Algorithm. The
x-axis represents iterations and y-axis represents the energy defined in the preceding slide. The blue line represents the solution of V1 (u) = u and V2 (u) = u(u−1) log u and converges to 0.00029. The orange line represents the solution of V1 (u) = √ u and V2 (u) = u(u−1) log u and converges to 0.0135. The green line represents the solution of V1 (u) = √ u and V2 (u) = u and converges to 0.0183. 45

Discussions: Mean field Information Geometry 46

Mean field information dynamics via reaction di...

Mean field information dynamics via reaction diffusions

More Decks by Wuchen Li

Other Decks in Research

Featured

Transcript