Lecture note in Oberwolfach Seminar 2023

Generalized optimal transport: optimization and mean ﬁeld control in probability
spaces Wuchen Li University of South Carolina, Columbia Oberwolfach seminar Nov 22, 2023. 1

Generalized optimal transport I Optimization in probability space. – Gradient
flows; – Accelerated flows; – Newton’s flows; – Convexity analysis; I Optimal control in probability space: – Hamiltonian flows: – Mean field control; – Mean field games; I Applications: – Regularized Wasserstein proximal operators; – Mean field modeling of pandemics propagation; – Scientific computing: Deep JKO schemes; 2

Topic I: Optimization in probability space I Accelerated information gradient
ﬂows. I Information Newton’s ﬂow: second-order optimization method in probability space. I Interacting Langevin Di↵usions: Gradient Structure and Ensemble Kalman Sampler. 3

Sampling problems 4

Examples Figure: Results from Wasserstein GAN. 5

Optimization I Optimization problems for sampling algorithms attract increasing attentions
from AI and machine learning communities. I Examples are given in model (ﬁnite di↵erence, ﬁnite element, neural networks, kernels) and model free ones: Variational inference, Bayesian inference, Generative Adversary Networks (GAN), and policy optimizations, etc. 6

Metric 7

Sampling Algorithm Arguably, one of the most classical sampling algorithm
is designed by the over-damped Langevin dynamics: dXt = rf(Xt)dt + p 2dBt, where Xt 2 Rd and Bt is the standard Brownian motion in Rd. Denote Xt ⇠ ⇢t . The evolution of ⇢t satisﬁes the Fokker-Planck equation @t⇢t = r · (⇢t rf) + ⇢t. The invariant distribution satisﬁes ⇢ ⇤ (x) = 1 K e f(x) , K = Z e f(y) dy. 8

Optimization in probability space Nowadays, the sampling algorithm relates to
the variational problem in probability density space P(⌦) = n ⇢ 2 F(⌦): Z ⌦ ⇢dx = 1, ⇢ 0. o . Consider the optimization problem therein: min ⇢2P(⌦) E(⇢). A typical choice of objective functional in Bayesian inference problem is the KL divergence: E(⇢) = DKL(⇢ke f ) = Z ⇢ log ⇢ e f dx. 9

Information Metrics Typical examples include Fisher or Wasserstein information metrics.
Information geometry I Fish information statistics (Amari, Ay, Zhang et.al.); I AI Inference problems: f–divergence, Amari ↵–divergence etc; I AI optimization methods: Natural gradient (Amari); Stochastic relaxation (Pistone, Malago), et.al. Transport geometry I Gradient descent: (Jordan, Kinderlehrer, Otto, Villani, Liu’s group, Slepcev, Carillo, Cances, Wang, Craig et.al.); I Applied math: (Osher, Rongjie, Preye, Solomon, Oberman, Benamou, Enquist, Yin, Stuart, et.al.); I Engineering: (Chen, Pavon, Georgiou, Tannenbaum, Guo et.al.); I AI Inference problems: (Cuturi et.al., Frogner et.al. 2015, Bottou et.al. 2017, Yau, Gu et.al 2019); I Optimization (Winsolo). 10

Gradient descent The Gradient flow of E(⇢) in a metric
space satisfies @t⇢t = G(⇢t) 1 E(⇢t) ⇢t . where G is the metric (precondioner) operator, which is often assume to be positive definite. Example I Fisher-Rao gradient flow of KL divergence leads to @t⇢t = grad F DKL(⇢t k⇢ ⇤ ) = ⇢(log ⇢ + f Z ⇢(log ⇢ + f)dx). I Wasserstein gradient flow of KL divergence satisfies @t⇢t = grad W DKL(⇢t k⇢ ⇤ ) =r · (⇢t r(f + log ⇢t + 1)) =r · (⇢t rf) + ⇢t. 11

First-order algorithms Sampling methods are first-order methods in probability space
based on di↵erent choices of metrics. They are usually “mean field” dynamics. I Stein metric and Stein variational gradient descent1; I Wasserstein-Kalman metric and Ensemble Kalman sampling2; I Wasserstein Newton’s flow. 1Liu and Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. NIPS, 2016. 2Garbuno-Inigo, Ho↵mann, Li, and Stuart. Interacting Langevin di↵usions: Gradient structure and ensemble Kalman sampler. SIAM applied dynamical system, 2019. 12

Related works I KL divergence+Wasserstein metric + Gradient descent =
Langevin dynamics (Optimal transport, Jordan, Kinderlehrer, Otto, Villani et.al.) I KL divergence+ Fisher-Rao metric+Gradient descent=Birth-death dynamics (Information geometry, Amari, et. al.) I KL divergence +Stein metric + Gradient descent = Stein variational gradient descent (Liu et.al.) I KL divergence+ Wasserstein-Kalman metric + Gradient descent = Ensemble Kalman sampling (Inigo, Ho↵man, Li, Stuart et.al.) Divergence+Metric + Accelerated gradient ﬂows= ? 13

Goal: Accelerated Langevin dynamics Can we design systematic accelerated gradient
flows for samplings, which applies the symplectic structures in information metrics? I Extending from the fact that optimal transport gradient flow of KL divergences forms Langevin dynamics, what are accelerated gradient flows? I What is the “Mean-field” accelerated Langevin dynamics? 14

Related works Finite dimensional I Euclidean space (Su. et.al); I
Variational formulation (Wibisono, Wilson and Jordan); I Hamiltonian descent (C. Maddison, D. Paulin, Y. Teh, B. O’Donoghue, A. Doucet); Density space I Liu et al. (ICML 2019): acceleration framework of ParVI methods based on manifold optimization; I Taghvaei and Mehta (ICML 2019): the accelerated ﬂow from an optimal control perspective; I Carrilo et.al. (CMP 2019) study the compressible Euler equation; I Cheng et al. and Ma et al.: underdamped Langevin dynamics. 15

Gradient ﬂows I The methods for optimization problems in the
Euclidean space have been well-studied. min x2Rn f(x). I The classical gradient descent method follows xk+1 = xk srf(xk). It can be formulated into a continuous time gradient ﬂow, ˙ x = rf(x). 16

Accelerate Gradient ﬂows I The Nesterov’s accelerated method is widely
applied in accelerating the vanilla gradient descent. It corresponds to the so-called accelerated gradient ﬂow ¨ x + ↵t ˙ x + rf(x) = 0. I This is equivalent to a damped Hamiltonian system  ˙ x ˙ p +  0 ↵tp  0 I I 0  rxHE (x, p) rpHE (x, p) = 0, H E (x, p) = 1 2 kpk2 2+f(x). with initial values x(0) = x0 and p(0) = 0. 1Su et. al. A di↵erential equation for modeling Nesterov accelerated gradient method: Theory and insights. Journal of Machine Learning Research, 2016. 17

A quick review on Information Metrics Definition (Metric in the
probability space) A metric tensor G(⇢) : T⇢ P(⌦) ! T ⇤ ⇢ P(⌦) is an invertible mapping from the tangent space at ⇢ to the cotangent space at ⇢. This metric tensor defines the metric (inner product) on the tangent space T⇢ P(⌦). Namely, for 1, 2 2 T⇢ P(⌦), we define g⇢( 1, 2) = Z 1G(⇢) 2dx = Z 1G(⇢) 1 2dx, where i is the solution to i = G(⇢) 1 i , i = 1, 2. 18

Example: Fisher-Rao metric I The Fisher-Rao metric comes from Information
Geometry. I The inverse of the Fisher-Rao metric tensor is deﬁned by G F (⇢) 1 = ⇢ ✓ Z ⇢dx ◆ , 2 T ⇤ ⇢ P(⌦). I For 1, 2 2 T⇢ P(⌦), the Fisher-Rao metric on the tangent space follows g F ⇢ ( 1, 2) = Z 1 2⇢dx ✓Z 1⇢dx ◆ ✓Z 2⇢dx ◆ , where i is the solution to i = ⇢ i R i⇢dx , i = 1, 2. 19

Example: Wasserstein metric I The Wasserstein metric comes from Optimal
Transport. I The inverse of the Wasserstein metric tensor is deﬁned by G W (⇢) 1 = r · (⇢r ), 2 T ⇤ ⇢ P(⌦). I The Wasserstein metric on the tangent space is given by g W ⇢ ( 1, 2) = Z ⇢ hr 1, r 2 i dx, 1, 2 2 T⇢ P(⌦), where i is the solution to i = r · (⇢r i), i = 1, 2. 20

Gradient flows I The gradient flow for E(⇢) in (P(⌦),
g⇢) takes the form @t⇢t = G(⇢t) 1 E(⇢t) ⇢t . I The Fisher-Rao gradient flow is given by @t⇢t = ⇢t ✓ E ⇢t Z E ⇢t ⇢tdx ◆ . I The Wasserstein gradient flow writes @t⇢t = r · ✓ ⇢t r E ⇢t ◆ . 21

Hamiltonian ﬂows I The Euler-Lagrange equation in the probability space
can be formulated as a system of (⇢t, t), i.e., 8 > < > : @t⇢t G(⇢t) 1 t = 0, @t t + 1 2 ⇢t ✓Z tG(⇢t) 1 tdx ◆ + E ⇢t = 0, where t is up to a spatially-constant function shrift. I Here the above PDE system is the Hamiltonian ﬂow @t  ⇢t t  0 1 1 0 " ⇢t H(⇢t, t) t H(⇢t, t) # = 0, with respect to the Hamiltonian H(⇢t, t) = 1 2 Z tG(⇢t) 1 tdx + E(⇢t). 22

Example: Fisher-Rao Hamiltonian ﬂow I The Fisher-Rao Hamiltonian ﬂow follows
8 > > < > > : @t⇢t t⇢t + Z t⇢tdx = 0, @t t + 1 2 2 t ✓Z ⇢t tdx ◆ t + E ⇢t = 0. I The corresponding Hamiltonian is HF (⇢t, t) = 1 2 Z 2 t ⇢tdx ✓Z ⇢t tdx ◆2 ! + E(⇢t). 23

Example: Wasserstein Hamiltonian flow I The Wasserstein Hamiltonian flow writes
8 < : @t⇢t + r · (⇢t r t) = 0, @t t + 1 2 kr t k2 + E ⇢t = 0. I The corresponding Hamiltonian forms HW (⇢t, t) = 1 2 Z kr t k2 ⇢tdx + E(⇢t). I This is identical to the dynamics in optimal transport and Mean field games. 24

Accelerate Information Gradient flow I Let ↵t 0 be a
scalar function of t. We add a damping term ↵t t to the Hamiltonian flow. @t  ⇢t t +  0 ↵t t  0 1 1 0 " ⇢t H(⇢t, t) t H(⇢t, t) # = 0. I This renders the Accelerated Information Gradient (AIG) flow 8 > < > : @t⇢t G(⇢t) 1 t = 0, @t t + ↵t t + 1 2 ⇢t ✓Z tG(⇢t) 1 tdx ◆ + E ⇢t = 0, (AIG) with initial values ⇢0 = ⇢0 and 0 = 0. 25

The choice of ↵t The choice of ↵t depends on
the geodesic convexity of E(⇢), which has an equivalent definition as follows. Definition For a functional E(⇢) defined on the probability space, we say that E(⇢) has a -positive Hessian (in short, Hess( )) w.r.t. the metric g⇢ if there exists a constant 0 such that for any ⇢ 2 P(⌦) and any 2 T⇢ P(⌦), we have g⇢(Hess E(⇢) , ) g⇢( , ). Here Hess is the Hessian operator w.r.t. g⇢ . I If E(⇢) is Hess( ) for > 0, then ↵t = 2 p . I If E(⇢) is Hess(0), then ↵t = 3/t. 26

Fisher-Rao AIG ﬂow I The Fisher-Rao AIG ﬂow writes 8
> > < > > : @t⇢t t⇢t + ✓Z t⇢tdx ◆ ⇢t = 0, @t t + ↵t t + 1 2 2 t ✓Z ⇢t tdx ◆ t + E ⇢t = 0. (F-AIG) 27

Wasserstein AIG flow I The Wasserstein AIG flow in the
Eulerian formulation follows 8 < : @t⇢t + r · (⇢t r t) = 0, @t t + ↵t t + 1 2 kr t k2 + E ⇢t = 0. (W-AIG) I The Wasserstein AIG flow in the Lagrangian formulation leads to 8 > < > : dXt = Vtdt, dVt = ↵tVtdt r ✓ E ⇢t ◆ (Xt)dt. I This is a particular type of initial-valued damped mean-field game systems, known in fluid dynamics (Carillo, 2020, Amir, et.al. 2019.). I Many other generalizations of AIG flows are provided in (Wang, Li, 2019). 28

Examples in Gaussian families Theorem Suppose that ⇢0, ⇢⇤ 2
N n 0 and their covariance matrices are ⌃0 and ⌃⇤ . E(⌃) evaluates the KL divergence from ⇢ to ⇢⇤ . Here (⌃t, St) satisfies ( ˙ ⌃t (St⌃t + ⌃tSt) = 0, ˙ St + ↵tSt + S2 t + 2r⌃t E(⌃t) = 0, (W-AIG-G) with initial values ⌃0 = ⌃0 and S0 = 0. Especially, for any t 0, ⌃t is well-defined and stays positive definite. 29

Convergence rate of W-AIG flows Theorem Suppose that E(⇢) satisfies
Hess( ) for > 0. The solution ⇢t to (W-AIG) with ↵t = 2 p satisfies E(⇢t)  C0e p t = O(e p t ). If E(⇢) only satisfies Hess(0), then the solution ⇢t to (W-AIG) with ↵t = 3/t satisfies E(⇢t)  C 0 0t 2 = O(t 2 ). Here the constants C0, C 0 0 only depend on ⇢0. 30

PDE convexities Convergence rate I Hess( ) is equivalent to
the -geodesic convexity in the probability space w.r.t. g⇢ . For the Wasserstein metric, it is also known as -displacement convexity. I If E(⇢) is the KL divergence and the target density takes the form ⇢ ⇤ / exp( f(x)). – A su cient condition for Hess( ) is that f(x) is -strongly convex. – The W-AIG ﬂow improves the O(e 2 t) convergence rate of the gradient ﬂow to O(e p t), especially when is close to 0. Convexities and entropy dissipation I -geodesic convexity=Wasserstein Hessian operator (Otto, Villani et.al.)=Weak form of Gamma calculus (Bakry-Emery et.al.) I Fisher-Rao + density manifold Christo↵el symbol=Weak form of Gamma calculus (Li, 2018). 31

Sketch of the proof I In the proof of previous
two propositions, we consider ut = @t(Tt) 1 Tt with properties r · (⇢t(ut r t)) = 0, @tTt = rTtut. I We show the following relationship Z hr t, @tTt + rTt r t i ⇢tdx = Z hr t ut, rTt(r t ut)i ⇢tdx 0. This is based on the Hodge decomposition. 32

W-AIG ﬂows in the particle level Proposition Suppose that Xt
⇠ ⇢t and Vt = r t(Xt) are the position and the velocity of a particle at time t. Then, the di↵erential equation of the particle system corresponding to (W-AIG) writes ( dXt = Vtdt, dVt = ↵tVtdt rf(Xt)dt r log ⇢t(Xt)dt. 33

Discrete-time algorithm for W-AIG ﬂow For i = 1, 2
. . . N, the update rule follows ( V i k+1 = ↵kV i k p ⌧(rf(X i k) + ⇠k(X i k)), X i k+1 = X i k + p ⌧V i k+1. If E(⇢) is Hess( ), then ↵k = 1 p ⌧ 1+ p ⌧ ; if E(⇢) is Hess(0) or is unknown, then ↵k = k 1 k+2 . Here ⇠k(x) is an approximation of r log ⇢t(x). Di culties I The approximation of r log ⇢t(x). I The sti↵ness of AIG ﬂows. 34

Reviews on the choices of ⇠k I If Xi k
follows a Gaussian distribution, then ⇠k(x) = ⌃ 1 k (x mk), where mk and ⌃k are the mean and the covariance of {Xi k }N i=1 . I For the non-Gaussian case, we use the kernel density estimation (KDE1). ⇠k writes ⇠k(x) = r log ˜ ⇢k(x) = PN i=1 rxK(x, Xi k) PN i=1 K(x, Xi k) . A common choice of K(x, y) is a Gaussian kernel with the bandwidth h, KG (x, y; h) = (2⇡h) n/2 exp kx yk2 /(2h) . 1Radhey S Singh. Improvement on some known nonparametric uniformly consistent estimators of derivatives of a density. The Annals of Statistics, pp. 394399, 1977. 35

Selection of the bandwidth I SVGD1 uses a median (MED)
method to choose the bandwidth, i.e., hk = 1 2 log(N + 1) median ⇣ {kX i k X j k k2}N i,j=1 ⌘ . I Liu et al.2 propose a Heat Equation (HE) method to adaptively adjust bandwidth. I Motivated by the HE method, we introduce the Brownian motion (BM) method to adaptively learn the bandwidth. 1Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in neural information processing systems, pp. 23782386, 2016. 2Chang Liu, Jingwei Zhuo, Pengyu Cheng, Ruiyi Zhang, Jun Zhu, and Lawrence Carin. Accelerated ﬁrst-order methods on the Wasserstein space for bayesian inference. arXiv preprint arXiv:1807.01750, 2018. 36

The BM method I Given the bandwidth h, {Xi k
}N i=1 and a step size s, we can compute two particle systems Y i k (h) = X i k s⇠k(x; h), Z i k = X i k + p 2sB i , i = 1, . . . N, where Bi is the standard Brownian motion. I Minimize MMD(ˆ ⇢Y , ˆ ⇢Z), the maximum mean discrepancy between the empirical distribution ˆ ⇢Y (x) = PN i=1 (x Y i k (h)) and ˆ ⇢Z(x) = PN i=1 (x Zi k). I Optimize over h, using the bandwidth hk 1 from the last iteration as the initialization. 37

Adaptive restart I The second equation in (W-AIG) is the
Hamilton-Jacobi equation, which usually has the strong sti↵ness. We propose an adaptive restart technique to deal with this problem. Consider 'k = N X i=1 ⌦ V i k+1, rf(X i k) + ⇠k(X i k) ↵ , which can be viewed as discrete-time approximation of @tE(⇢t). I If 'k < 0, then we restart the algorithm with initial values Xi 0 = Xi k and V i 0 = 0. I This essentially keeps @tE(⇢t) negative along the trajectory. I The numerical results show that the adaptive restart accelerates and stabilizes the discrete-time algorithm. 38

Overall algorithm Algorithm 1 Particle implementation of Wasserstein AIG ﬂow
Require: initial positions {Xi 0} N i=1 , step size ⌧, number of iteration L. 1: Set k = 0, V i 0 = 0, i = 1, . . . N. Set the bandwidth h0 by MED. 2: for l = 1, 2, . . . L do 3: Compute hl based on BM: hl = BM(hl 1 , {Xi k} N i=1 , p⌧). 4: Calculate ⇠k (Xi k ) with the bandwidth hl . 5: Set ↵k based on whether E(⇢) is Hess( ) or Hess(0). Update V i k+1 = ↵k V i k p⌧(rf(Xi k ) + ⇠k (Xi k )), Xi k+1 = Xi k + p⌧V i k+1 . 6: if RESTART then 7: Compute 'k = P N i=1 ⌦ V i k+1 , rf(Xi k ) + ⇠k (Xi k ) ↵ . 8: If 'k < 0, set Xi 0 = Xi k and V i 0 = 0 and k = 0; otherwise set k = k+1. 9: else 10: Set k = k + 1. 11: end if 12: end for 39

Toy example I The target density ⇢ ⇤ is a
toy bimodal distribution1. I We compare two types of particle implementations of the Wasserstein gradient flow over the KL divergence: X i k+1 = X i k ⌧rf(X i k)+ p 2⌧B i k. X i k+1 = X i k ⌧rf(X i k) ⌧⇠k(X i k). Here Bi k ⇠ N(0, 1) is the standard Brownian motion and ⇠k is computed by KDE. The first method is known as the MCMC method. The second method is called the ParVI method, whose bandwidth h is selected by MED/HE/BM respectively. 1Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015. 40

Toy example 41

Gaussian examples: ODE level I The initial value is set
to be ⌃0 = I. I The following ﬁgure presents the convergence of the KL divergence on two target distributions with small/large L. Figure: The acceleration of W-AIG ﬂow and the e↵ect of adaptive restart (ODE level). The target density is a Gaussian distribution with zero mean on R100. Left: L = 1,  ⇡ 3.8 ⇥ 103. Right: = 1,  ⇡ 4.0 ⇥ 103. 42

Gaussian examples: Particle level I The initial distribution of samples
follows N(0, I) and the number of samples is N = 600. I For a particle system {Xi k }N i=1 , we record the KL divergence E(ˆ ⌃k) using the empirical covariance matrix ˆ ⌃k . Figure: The acceleration of W-AIG ﬂow and the e↵ect of adaptive restart (particle level). The setting of the target density is identical to the previous ﬁgure. 43

Bayesian logistic regression Figure: Comparison of di↵erent methods on Bayesian
logistic regression, averaged over 10 independent trials. The shaded areas show the variance over 10 trials. Top: BM Bottom: MED. Left: Test accuracy; Right: Test log-likelihood. 44

Topic II: Optimal control in probability space I Computational mean-field
information dynamics associated with reaction-di↵usion equations. I Controlling conservation laws I, II: entropy-entropy flux. I Mean field games. I Master equations for finite state mean field games with nonlinear activations. 45

Pandemic evolution 2

Vaccine distribution 3

Generalized unnormalized OT 4

Reaction di↵usion Consider a scalar nonlinear reaction di↵usion equation by
@tu(t, x) = F(u(t, x)) + R(u(t, x)), where x 2 ⌦, ⌦ ⇢ Rd is a compact convex spatial domain, u 2 M(⌦) = {u 2 C1(⌦): u 0}, and is the Euclidean Laplacian operator. 5

Reaction di↵usions and generalized optimal transport In this talk, we
ﬁrst review reaction di↵usions with optimal transport type metric space. Using this connection, we further design mean ﬁeld control problems for reaction-di↵usion. This also provides variational schemes for solving classical reaction-di↵usion in both time and spatial domains. 6

Entropy, Information, Control 7

Literature The optimal transport problem was first introduced by Monge
in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. I Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; I Lyapunov methods and Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; I Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Larsy, Lions, Carmona, Souganidis, Georgiou, Nelson, La↵erty, et.al. I Numerical OT and MFG: Benamou, Nurbekyan, Oberman, Osher, Achdou, Caines, Tamer, Lai, Lauriere, et.al. 8

Mean ﬁeld control for reaction di↵usions? 9

Lyapunov methods Consider a Lyapunov functional G(u) = Z G(u(x))dx,
where G: R ! R is a given convex function with G00(u) > 0. In fact, how to choose entropy G can be given from entropy-entropy ﬂux-metric condition. We have seen it in Prof. Osher’s talk. 10

Metrics Along the solution of reaction di↵usion, one observes that
d dtG(u(t, ·)) = Z G0(u) · @tudx = Z G0( F + R)dx = Z ⇣ rG0, rF ⌘ dx + Z G0Rdx = Z ⇣ rG0, ru ⌘ F0dx + Z G0Rdx = Z ⇣ rG0, ru ⌘ G00 F0(u) G00 dx + Z G02 R G0 dx = Z ⇣ rG0, rG0 ⌘ F0 G00 dx + Z G0(u)2 R G0 dx, where we apply integration by parts in the third equality and rG0(u) = G00(u)ru in the last equality. 11

Lyapunov methods induced metrics I Denote the smooth positive probability
space by M = n u 2 C1(⌦): u > 0 o . Given F, G: R ! R satisfying F0(u) > 0 if u > 0, and G00(u) > 0. Denote V1(u) = F0(u) G00(u) , V2(u) = R(u) G0(u) . Denote the tangent space of M at u 2 M by TuM = n 2 C1(⌦) o . 12

Lyapunov methods induced metrics II Definition (Mean-field information metric) The
inner product g(u): TuM ⇥ TuM ! R is given below. For any 1 , 2 2 TuM, define g(u)( 1, 2) = Z ⌦ (r 1, r 2)V1(u)dx + Z ⌦ 1 2V2(u)dx, where i = r · (V1(u)r i) + V2(u) i, i = 1, 2. 13

Reaction-di↵usions in metric spaces Given an energy functional F :
M ! R, the gradient ﬂow of F in (M(⌦), g) satisﬁes @tu(t, x) = r · (V1(u)r uF(u))(t, x) V2(u) uF(u)(t, x). In particular, if F(u) = G(u) = R G(u)dx, then @tu =r · (V1(u)rG0(u)) V2(u)G0(u) =r · ( F0(u) G00(u) G00(u)ru) + R(u) G0(u) G0(u) =r · (F0(u)ru) + R(u) = F(u) + R(u). 14

Controlling reaction di↵usions 15

Controlling reaction di↵usions Denote a given energy functional by F
: M(⌦) ! R. Consider a variational problem by inf v1,v2,u Z 1 0 h Z ⌦ 1 2kv1k 2V1(u) + 1 2|v2| 2V2(u)dx F(u) i dt where the infimum is taken among all density functions u: [0, 1] ⇥ ⌦ ! R, vector fields v1 : [0, 1] ⇥ ⌦ ! Rd, and reaction rate functions v2 : [0, 1] ⇥ ⌦ ! R, such that @tu(t, x) + r · (V1(u(t, x))v1(t, x)) = v2(t, x)V2(u(t, x)), with fixed initial and terminal density functions u0 , u1 2 M(⌦). 16

Mean ﬁeld information dynamics The critical point of variational problem
satisﬁes v1(t, x) = r (t, x), v2(t, x) = (t, x), with 8 > > > > < > > > > : @tu(t, x) + r · (V1(u(t, x))r (t, x)) = V2(u(t, x)) (t, x) @t (t, x) + 1 2kr (t, x)k 2V 0 1 (u(t, x)) + 1 2| (t, x)| 2V 0 2 (u(t, x)) + uF(u)(t, x) = 0. 17

Mean ﬁeld information Hamiltonian There exists a Hamiltonian formulations for
the control problem. 8 > < > : @tu = H(u, ), @t = uH(u, ), where H is the Hamiltonian functional deﬁned by H(u, ) = 1 2 Z ⌦ kr k 2V1(u)dx + 1 2 Z ⌦ | | 2V2(u)dx + F(u). 18

Example I Example (Wasserstein metric, heat flow and density Hamilton
flows) Let G(u) = u log u u, F(u) = u, R(u) = 0, thus V1(u) = F0(u) G00(u) = u, V2(u) = R(u) G0(u) = 0. Then the mean-field information metric forms g(u)( 1, 2) = Z ⌦ (r 1(x), r 2(x))u(x)dx. Here the Hamiltonian flow in (M(⌦), g) satisfies 8 < : @tu + r · (ur ) = 0 @t + 1 2kr k 2 + uF(u) = 0. 19

Example II Example (Fisher-Rao metric, Birth-death equation and density Hamiltonian
flows) Consider V1(u) = F0(u) G00(u) = 0, V2(u) = R(u) G0(u) = u. Then the mean-field information metric satisfies g(u)( 1, 2) = Z ⌦ 1(x) 2(x)u(x)dx. Here the Hamiltonian flow in (M(⌦), g) satisfies 8 < : @tu u = 0 @t + 1 2| | 2 + uF(u) = 0. 20

Example III Example (Fisher-KPP metric, Fisher-KPP equation and density Hamiltonian
flows) Consider F(u) = u, G(u) = u log u u, R(u) = u(1 u). Thus V1(u) = F0(u) G00(u) = u, V2(u) = R(u) G0(u) = u(u 1) log u . Here the Hamiltonian flow in (M(⌦), g) satisfies 8 > > < > > : @tu + r · (ur ) u(u 1) log u = 0 @t + 1 2kr k 2 + | | 2 (2u 1) log u + 1 u (log u)2 + uF(u) = 0. 21

Primal-dual hybrid gradient algorithms Consider the following convex optimization problem.
min z f(Az) + g(z), where z is a variable to be minimized, f and g are convex functions and A is a continuous linear operator. The algorithm solves the problem by iterating z (k+1) = argmin z L(z, ¯ p) + 1 2⌧ kz z (k)k2 L 2 p (k+1) = argmax p L(z (k+1) , p) 1 2 kp p (k)k2 L 2 ¯ p = 2p (k+1) p (k) , (1) where p is a dual variable, L is a Lagrangian functional L(z, p) = g(z) + hAz, pi f ⇤(p) and f ⇤(p) = sup z hz, pi f(z) is the Legendre transform of f. 22

Primal-dual hybrid gradient methods The scheme converges if the step
sizes ⌧ and satisfy ⌧ kAT Ak L 2 < 1, where k · k L 2 is an operator norm in L 2 . These step sizes are dependent on the operator A. In this talk, we use G-prox PDHG, a variation of the primal-dual algorithm. G-Prox PDHG provides an appropriate choice of norms for the algorithm that allows the algorithm to converge faster than the vanilla PDHG algorithm. 23

G-prox primal-dual hybrid gradient methods G-prox PDHG ﬁnds the saddle
point by iterating z (k+1) = argmin z L(z, ¯ p) + 1 2⌧ kz z (k)k2 L 2 p (k+1) = argmax p L(z (k+1) , p) 1 2 kp p (k)k2 H ¯ p = 2p (k+1) p (k) , where the norm k · k H is deﬁned as kzk2 H = kAT zk2 L 2 . By choosing this proper norm, the step size only needs to satisfy ⌧ < 1, which is independent of an operator A. 24

Implementation of the algorithm We set the linear operator A
as A(m1, m2, u) = @tu(t, x) + r · m1(t, x) m2(t, x). We set convex functionals f and g below g(m1, m2, u) = Z 1 0 Z ⌦ km1(t, x)k2 2V1(u(t, x)) + |m2(t, x)|2 2V2(u(t, x))dx F(u) dt f (A(m1, m2, u)) = ( 0 if A(m1, m2, u) = 0 1 otherwise f ⇤( ) = 0. Thus, we have the Lagrangian functional L(m1, m2, u, ) = Z 1 0 Z ⌦ n km1(t, x)k2 2V1(u(t, x)) + |m2(t, x)|2 2V2(u(t, x)) + (t, x) ⇣ @tu(t, x) + r · m1(t, x) m2(t, x) ⌘o dx F(u) dt. 25

Implementation of the algorithm We ﬁnd the saddle point (m
⇤ 1, m ⇤ 2, u ⇤ , ⇤) by iterating the following schemes: u (k+1) = argmin u L(m (k) 1 , m (k) 2 , u, ¯) + 1 2⌧ ku u (k)k2 L 2 m (k+1) 1 = argmin m L(m, m (k) 2 , u (k) , ¯) + 1 2⌧ km m (k) 1 k2 L 2 m (k+1) 2 = argmin m L(m (k) 1 , m, u (k) , ¯) + 1 2⌧ km m (k) 2 k2 L 2 (k+1) = argmax L(m (k+1) 1 , m (k+1) 2 , u (k+1) , ) 1 2 k (k)k2 H ¯ = 2 (k+1) (k) . Here, L 2 and H norms are deﬁned as kuk2 L 2 = (u, u) L 2 = Z 1 0 Z ⌦ kuk2 dx dt, kuk2 H = kAT uk2 L 2 for any u : [0, 1] ⇥ ⌦ ! Rn. 26

Implementation of the algorithm We can find the explicit formula
of u (k+1) , m (k+1) 1 , m (k+1) 2 , (k+1) using the first-order optimality conditions. The optimality conditions of m (k+1) 1 , m (k+1) 2 , (k+1) result in first-order equations which can be solved easily. Thus, we have m (k+1) 1 = V1(u (k)) ⌧ + V1(u(k)) ⇣ m (k) 1 + ⌧r (k) ⌘ m (k+1) 2 = V2(u (k)) ⌧ + V2(u(k)) ⇣ m (k) 2 + ⌧ (k) ⌘ (k+ 1 2 ) = (k) + (AAT ) 1 ⇣ @tu (k+1) + r · m (k+1) 1 m (k+1) 2 ⌘ 27

Implementation of the algorithm The optimality condition of u (k+1)
involves the nonlinear equation, which we compute using Newton’s method. For j = 1, 2, · · · a = km1k2 2V 2 1 (u j ) V 0 1 (uj) |m2|2 2V 2 2 (u j ) V 0 2 (uj) @ t + c log uj + 1 ⌧ (uj u(k)) b = km1k2 V 3 1 (u j ) (V 0 1 (uj))2 km1k2 2V 2 1 (u j ) V 00 1 (uj) + |m2|2 V 3 2 (u j ) (V 0 2 (uj))2 |m2|2 2V 2 2 (u j ) V 00 2 (uj) + c u j + 1 ⌧ uj+1 = max(0, uj a b ) 28

Algorithm Algorithm: G-prox PDHG for mean-ﬁeld information variational problem Input:
Initial density u0 and terminal density u1. Output: u, , m2 : [0, T] ⇥ ⌦ ! R, m1 : [0, T] ⇥ ⌦ ! Rd. For k = 1, 2, · · · u(k+1) Compute u(k+1) using Newton’s method. m(k+1) 1 = V1(u (k)) ⌧+V1(u (k)) ⇣ m(k) 1 + ⌧r (k) ⌘ m(k+1) 2 = V2(u (k)) ⌧+V2(u (k)) ⇣ m(k) 2 + ⌧ (k) ⌘ (k+ 1 2 ) = (k) + (AAT ) 1 ⇣ @ t u(k+1) + r · m(k+1) 1 m(k+1) 2 ⌘ (k+1) = 2 (k+ 1 2 ) (k) 29

Example We present a numerical experiment using the Algorithm with
di↵erent V1 , V2 functions. We wrote C++ codes to run the numerical experiments. For all the experiments, we used the discretization sizes Nx1 = Nx2 = 128, Nt = 30. V1(u) = u, V2(u) = u↵, ↵ > 0. We show how varying the exponent ↵ a↵ects the evolution of the densities. Below are the initial and terminal densities used in the simulations. u0(0, x) = 10 exp 60 (x 0.3)2 + (y 0.7)2 + 1, u1(0, x) = 20 exp 60 (x 0.7)2 + (y 0.3)2 + 1. 30

Figure: Example: Snapshots of the densities at di↵erent times t.
The ﬁrst, the second, the third and the fourth rows represent V2(u) = u, V2(u) = u 2 , V2(u) = u 3 , V2(u) = u 4 , respectively. 31

Figure: Example: Cross sections along a line x + y
= 1 for di↵erent time t. Each color represents V2(u) = u, V2(u) = u 2 , V2(u) = u 4 . 32

Topic II: Optimal control in probability space I Computational mean-field
information dynamics associated with reaction-di↵usion equations. I Controlling conservation laws I, II: entropy-entropy flux. I Mean field games. I Master equations for finite state mean field games with nonlinear activations. 45

Tra c in LA 2

Mean-ﬁeld PDE models 3

Regularized scalar conservation laws Consider @ t u + r
· f(u) = r · (A(u)ru), where the known variable is u = u(t, x), (t, x) 2 R + ⇥ Rd, f : Rd ! Rd is a ﬂux function, A(u) 2 Rd⇥d is a positive semi-deﬁnite matrix function, and > 0 is a viscosity constant. The PDE is known as a regularized scalar conservation law. 4

Conservation laws and generalized optimal transport In literature, there are
actively joint studies to work on entropy, Fisher information, and transportation. Nowadays, these connections have applications in mean field games, information theory, AI, quantum computing, computational graphics, computational physics, and Bayesian inverse problems. In this talk, we connect Lax’s entropy-entropy flux in conservation laws with optimal transport type metric spaces. Following this connection, we further design variational discretizations for conservation laws and mean field control of conservation laws. This includes a new and e cient method for developing unconditionally stable in time implicit approximations to regularized conservation laws and many other initial value problems. 5

Entropy, Information, transportation 6

Heat equation and Entropy dissipation Consider the heat equation in
Rd by @u(t, x) @t = r · (ru(t, x)) = u(t, x). Consider the negative Boltzmann-Shannon entropy given by H(u) = Z Rd u(x)(log u(x) 1)dx. Along the time evolution of heat equation, the following dissipation relation holds: d dtH(u(t, ·)) = Z Rd krx log u(t, x)k 2u(t, x)dx = I(u(t, ·)), where I(u) is the Fisher information functional. There is a formulation behind this relation, namely I The heat ﬂow is a gradient descent ﬂow of entropy in optimal transport metric. 7

Optimal transport What is the optimal way to move or
transport the mountain with shape X, density u0(x) to another shape Y with density u1(y)? 8

Literature The optimal transport problem was first introduced by Monge
in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. I Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; I Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; I Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Larsy, Lions, Georgiou, Tallenbum, Nelson, La↵erty, Liu, Gao, et.al. I Numerical OT and Mean field control/MFG: Benamou, Oberman, Osher, Achdou, Yang, Justin, Fu, et.al. I Natural gradient, Mean field control/MFG and Machine learning: Amari, Li, Chen, Montufar, Yang, Liu, Nurbekyan, Chen, Osher, Lars, Zhang, Katsoulakis, Zhou, et.al. 9

Entropy dissipation = Lyapunov methods The gradient ﬂow of the
negative entropy H(u) = Z Rd u(x)log u(x)dx, w.r.t. optimal transport metric distance satisﬁes @u @t = r · (urlog u) = u. Here the major trick is that ur log u = ru. In this way, one can study the entropy dissipation by d dtH(u) = Z Rd log ur · (urlog u)dx = Z Rd kr log uk 2udx. 10

Lyapunov method induced calculus Informally speaking, Wasserstein-Otto metric refers to
the following bilinear form: h ˙ u1, G(u) ˙ u2i = Z ( ˙ u1, ( u ) 1 ˙ u2)dx. In other words, denote u = r · (ur), and ˙ u i = u i = r · (ur i ), i = 1, 2, then h 1, G(u) 1 2i = h 1, r · (ur) 2i = Z (r 1, r 2)udx, where u 2 P(⌦), ˙ u i is the tangent vector in P(⌦) with Z ˙ u i dx = 0, and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point u. Here r·, r are standard divergence and gradient operators in ⌦. 11

Gradient ﬂows The Wasserstein gradient ﬂow of an energy functional
F(u) leads to @ t u = G(u) 1 uF(u) =r · (ur uF(u)). I If F(u) = R F(x)u(x)dx, then @ t u = r · (urF(x)). I If F(u) = R u(x) log u(x)dx, then @ t u = r · (ur log u) = u. 12

Hamiltonian ﬂows Consider the Lagrangian functional L(u, @ t u)
= 1 2 Z ⇣ @ t u, ( r · (ur)) 1@ t u ⌘ dx F(u). By the Legendre transform, H(u, ) = sup @tu Z @ t u dx L(u, @ t u). And the Hamiltonian system follows @ t u = H(u, ), @ t = uH(u, ), where u , are L2 ﬁrst variation operators w.r.t. u, , respectively and the density Hamiltonian forms H(u, ) = 1 2 Z kr k 2udx + F(u). Here u is the “density” state variable and is the “density” moment variable. 13

Examples: Optimal transport and mean field games Thus, the Hamiltonian
flow satisfies 8 < : @ t u + r · (ur ) = 0 @ t + 1 2kr k 2 = uF(u). This is a well known dynamic, which is studied in potential mean field games, optimal transport and PDE. 14

Optimal transport type formulaisms We are currently working on the
“universe” of generalized optimal transport and mean ﬁeld games. I AI theory, inference, and computations; I Quantum computing; I Algorithms of mean-ﬁeld MCMC algorithms, and numerical PDEs. In this lecture, we point out that there are optimal transport type formalisms to study the control of conservation laws and to design variational numerical schemes. Related works I Godunov (1970); I Lax–Friedrichs (1971); I Brenier (2018); I Osher, Shu, Harten, van Leer: Monotone schemes, TVD schemes, ENO, WENO, et.al. 15

Main topic: Controlling conservation laws 16

Entropy-Entropy flux pairs Consider @ t u(t, x) + rx
· f(u) = 0. where u: R + ⇥ Rn ! Rd, and f : Rd ! Rd. Definition (Entropy-entropy flux pair for systems (Lax)) We call (G, ) an entropy-entropy flux pair for the above conservation law system if there exists a convex function G: Rd ! R, and : Rd ! R, such that 0(u) = G0(u)f0(u). And an entropy solution satisfies @ t G(u) + rx · (u)  0. This is trivial for scalar conservation laws, where d = 1. 17

Lyapunov methods: Entropy-Entropy ﬂux Denote G(u) = Z ⌦ G(u)dx.
In this case, Z ⌦ G0(u)r · f(u)dx = Z ⌦ G0(u)(f0(u), ru)dx = Z ⌦ ( 0(u), ru)dx = Z ⌦ r · (u)dx = 0, where we apply the fact that f0(u)G0(u) = 0(u) and R ⌦ r · (u)dx = 0. 18

Lyapunov methods: Viscosity In addition, Z ⌦ G0(u)r · (A(u)ru)dx
= Z ⌦ rG0(u), A(u)ru dx = Z ⌦ rG0(u), A(u)G00(u) 1 rG0(u) dx, where we apply rG0(u) = G00(u)ru, in the last equality. We require that A(u)G00(u) 1 ⌫ 0, and A is nonnegative deﬁnite, G00(u) > 0. 19

Entropy-Entropy ﬂux-Fisher information dissipation Hence we know that @ tG(u)
= Z ⌦ G0(u) · @ t udx = Z ⌦ rG0(u), A(u)G00(u) 1 rG0(u) dx  0. This implies that G(u) is a Lyapunov functional for regularized conservation laws. 20

Entropy-Entropy flux-transport metrics Definition (Entropy-entropy flux-metric condition) We call (G,
) an entropy-entropy flux pair-metric for conservation laws if there exists a convex function G: Rd ! R, and : Rd ! R, such that 0(u) = G0(u)f0(u), and A(u)G00(u) 1 is symmetric positive semi-definite. 21

Lyapunov methods induced transport metrics Under the entropy-entropy flux-metric condition,
there is a metric operator for regularized conservation laws. Define the space of function u by M = n u 2 C1(⌦): Z ⌦ u(x)dx = constant o . The tangent space of M(u) at point u satisfies T uM = n 2 C1(⌦): Z ⌦ (x)dx = 0 o . Denote an elliptic operator LC : C1(⌦) ! C1(R) by LC(u) = r · (A(u)G00(u) 1 r). 22

Lyapunov methods induced transport metrics Deﬁnition (Metric) The inner product
g(u): T uM ⇥ T uM ! R is given below. g(u)( 1, 2) = Z ⌦ ( 1, LC(u) 2)dx = Z ⌦ 1r · (A(u)G00(u) 1 r 2)dx = Z ⌦ (r 1, A(u)G00(u) 1 r 2)dx = Z ⌦ 1 2dx = Z ⌦ 2 1dx, where i 2 C1(⌦) satisﬁes i = r · (A(u)G00(u) 1 r i ), i = 1, 2. 23

Entropy induced Gradient flows Proposition (Gradient flow) Given an energy
functional E : M ! R, the gradient flow of E in (M, g) satisfies @ t u = r · (A(u)G00(u) 1 r uE(u)). If E(u) = G(u) = R ⌦ G(u)dx, then the above gradient flow satisfies @ t u = r · (A(u)ru). 24

Flux-gradient flows Definition (Flux–gradient flow) Given an energy functional E
: M ! R, consider a class of PDE @ t u + rx · f1(x, u) = rx · (A(u)G00(u) 1 rx uE(u)), (1) where f1 : ⌦ ⇥ R1 ! Rn is a flux function satisfying Z ⌦ f1(x, u) · rx u(x)E(u)dx = 0. Remark This is the generalized optimal transport for GENERIC (nonequilibrium reversible-irreversible coupling), Grmela and Ottinger, 1997. 25

Example If E(u) = G(u) = R ⌦ G(u)dx and
f1(x, u) = f(u), then equation (1) forms regularized conservation laws. @ t u + rx · f(u) = rx · (A(u)G00(u) 1 rx uG(u)) = rx · (A(u)G00(u) 1 rx G0(u)) = rx · (A(u)G00(u) 1G00(u)rx u) = rx · (A(u)ru), and Z ⌦ rx uG(u) · f(u)dx = Z ⌦ rx G0(u) · f(u)dx = Z ⌦ G0(u) · ⇣ rx · f(u) ⌘ dx = Z ⌦ r · (u)dx =0. 26

Controlling flux-gradient flows Given smooth functionals F, H: M !
R, consider a variational problem inf u,v,u1 Z 1 0 h Z ⌦ 1 2 v, A(u)G00(u) 1v dx F(u) i dt + H(u1), where the infimum is taken among variables v: [0, 1] ⇥ ⌦ ! Rn, u: [0, 1] ⇥ ⌦ ! R, and u1 : ⌦ ! R satisfying @ t u + r · f(u) + r · (A(u)G00(u) 1v) = r · (A(u)ru), and u(0, x) = u0(x). 27

Primal-dual conservation laws Proposition (Hamiltonian ﬂows of conservation laws) The
critical point system of the variational problem is given below. There exists a function : [0, 1] ⇥ ⌦ ! R, such that v(t, x) = r (t, x), and 8 > > < > > : @ t u + r · f(u) + r · (A(u)G00(u) 1 r ) = r · (A(u)ru), @ t + (r , f0(u)) + 1 2 (r , (A(u)G00(u) 1)0 r ) + uF(u) = r · (A(u)r ) + (r , A0(u)ru). (2) Here 0 represents the derivative w.r.t. variable u. The initial and terminal time conditions satisfy u(0, x) = u0(x), u1 H(u1) + (1, x) = 0. 28

Hamiltonian flows of conservation laws Proposition The primal-dual conservation law
system (2) has the following Hamiltonian flow formulation. @ t u = HG(u, ), @ t = uHG(u, ), where we define the Hamiltonian functional HG : M ⇥ C1(⌦) ! R by HG(u, ) = Z ⌦ h1 2 (r , A(u)G00(u) 1 r ) + (r , f(u)) (r , A(u)ru) i dx + F(u). In other words, the Hamiltonian functional HG(u, ) is conserved along Hamiltonian dynamics: d dtHG(u, ) = 0. 29

Modeling: Controlling tra c flows Position control. The unknown variable
u in tra c flows represents the density function of cars (particles) in a given spatial domain. Here, the background dynamics of u is the classical tra c flow. The control variable is the velocity for enforcing each car’s velocity in addition to its background tra c flow dynamics. The goal is to control the “total enforced kinetic energy” of all cars, in which individual cars can determine their velocities through both noise and tra c flow interactions. 30

Controlling Tra c ﬂow problems Consider inf u,v,u1 Z 1
0 h Z ⌦ 1 2kv(t, x)k 2u(t, x)dx F(u) i dt + H(u1), (3a) such that 0  u(t, x)  1, for all t 2 [0, 1], and @ t u(t, x)+r· u(t, x)(1 u(t, x)) +r· u(t, x)v(t, x) = u(t, x), (3b) with u(0, x) = u0(x). 31

Controlling tra c ﬂow dynamics There exists a scalar function
, such that v(t, x) = r (t, x), and 8 < : @ t u + r · u(1 u) + r · ur = u, @ t + 1 2u, r + 1 2kr k 2 + uF(u) = . 32

The Primal-dual Algorithms The classical primal-dual hybrid gradient algorithms (PDHG)
solves the following saddle point problem min z max p hKz, piL2 + g(z) h ⇤(p). When the operator K is nonlinear, we apply its extension with K(z) ⇡ K(¯ z) + rK(¯ z)(z ¯ z). Here, the extension of the PDHG scheme is as follows z n+1 = argmin z hz, [rK(z n)]T ¯ p n iL2 + g(z) + 1 2⌧ kz z n k 2 L2 , p n+1 = argmax p hK(z n+1), piL2 h ⇤(p) 1 2 kp p n k 2 L2 , ¯ p n+1 = 2p n+1 p n . Set z = (u, m), p = , K ((u, m)) = @tu + r · f(u) + r · m u, g ((u, m)) = Z 1 0 ✓Z ⌦ kmk 2 2u + 1[0,1] (u)dx F(u) ◆ dt, h(Kz) = ( 0 if Kz = 0 +1 else . 33

Solve the Control Problem with Primal-dual Algorithms We rewrite the
varational problem as follows: inf u,m sup L(u, m, ), with u(0, x) = u0(x), (1, x) = u(1, x)H(u), L(u, m, ) = Z 1 0 ✓Z ⌦ kmk2 2u + 1[0,1] (u)dx F(u) ◆ dt + Z 1 0 Z ⌦ (@ t u + r · f(u) + r · m u) dxdt, and f(u) = u(1 u) is the tra c ﬂux function. We denote the indicator function by 1A(x) = ( +1 x / 2 A 0 x 2 A . 34

Algorithm Algorithm :PDHG for the conservation law control system While
k < Maximal number of iteration u(k+1), m(k+1) = argmin u L(u, m, ¯(k)) + 1 2⌧ k(u, m) (u(k), m(k))k2 L2 ; (k+1) = argmax L(u(k+1), m(k+1), ) 1 2 k (k) k2 H2 1 ; ¯(k+1) = 2 (k+1) (k); 35

Monotone schemes Definition For p, q 2 N, a scheme
u k+1 j = G(u k j p 1, ..., u k j+q ) is called a monotone scheme if G is a monotonically nondecreasing function of each argument. Lax–Friedrichs Scheme. For discretization t, x in time and space, denote u k j = u(k t, j x), then the Lax–Friedrichs scheme is as follows u k+1 j = u k j t 2 x ⇣ f(u k j+1 ) f(u k j 1 ) ⌘ +( +c x) t ( x)2 ⇣ u k j+1 2u k j + u k j 1 ⌘ . To guarantee that the above scheme is monotone, we need: 1 2( + c x) t ( x)2 0, t 2 x |f 0(u)| + ( + c x) t ( x)2 0. As we want the scheme works when ! 0, the restriction on c and space–time stepsizes can be simplified as follows: c 1 2 |f 0(u)|, ( x)2 2( + c x) t. The first inequality suggests the artificial viscosity we need to add. The second one impose a strong restriction on the stepsize in time when > 0. 36

Discretization of the control problem We consider the control problem
of scalar conservation law deﬁned in [0, b] ⇥ [0, 1] with periodic boundary condition on the spatial domain. Given N x , N t > 0, we have x = b Nx , t = 1 Nt . For x i = i x, t l = l t, deﬁne u l i = u(tl, xi) 1  i  Nx, 0  l  Nt, m l 1,i = (mx1 (tl, xi))+ 1  i  Nx, 0  l  Nt 1, m l 2,i = (mx1 (tl, xi)) 1  i  Nx, 0  l  Nt 1, l i = (tl, xi) 1  i  Nx, 0  l  Nt, Nt i = u(1, xi) H(u Nt i ) 1  i  Nx, where u + := max(u, 0) and u = u + u. Note here m l 1,i 2 R+, m l 2,i 2 R . Denote (Du)i := ui+1 ui x [Du]i := ⇣ (Du)i , (Du)i 1 ⌘ d [Du]i = ⇣ (Du)+ i , (Du)i 1 ⌘ Lap(u)i = ui+1 2ui + ui 1 ( x)2 37

Discretization of the control problem The ﬁrst conservation law equation
adapted from the Lax–Friedrichs scheme satisﬁes ⇣ u l+1 i u l i ⌘ t + ⇣ f(u l+1 i+1 ) f(u l+1 i 1 ) ⌘ 2 x +(Dm)l 1,i 1 +(Dm)l 2,i = ( +c x)Lap(u)l+1 i . Following the discretization of the conservation law, the discrete saddle point problem has the following form: min u,m max L(u, m, ), where L(u, m, ) = x t X i,l (m l 1 1,i )2 + (m l 1 2,i )2 2u l i + 1[0,1] (u l i ) t X l F(u l) + x X i H(u Nt i ) + x t X i,l l i ⇣ u l+1 i u l i t + f(u l+1 i+1 ) f(u l+1 i 1 ) 2 x + (Dm)l 1,i 1 + (Dm)l 2,i ( + c x)Lap(u)l+1 i ⌘ . 38

Discretization of the control problem By taking the first order
derivative of ul i , we automatically get the implicit finite di↵erence scheme for the dual equation of that is backward in time: 1 t ⇣ l+1 i l i ⌘ + ( l i+1 l i 1 ) 2 x f 0(u l i )+ 1 2 k[d D ]l ik 2+ F(u l i ) u = ( +c x)Lap( )l i. The discrete form of the Hamiltonian functional at t = t l takes the form HG(u, ) = X i 1 2 k[d D ]l ik 2 u l i + ( l i+1 l i 1 ) 2 x f(u l i ) + ( + c x)u l iLap( )l i ! F(u l). I When f(u) = 0, the above discretization reduces to the finite di↵erence scheme for the mean-field game system. I When F = 0, H = c for some constant c, the variational problem becomes classical conservation laws with initial data. No control will be enforced on the density function u. I Our algorithm provides an alternative way to solve the nonlinear conservation law with implicit discretization in time. 39

Example 1: Tra c ﬂows We consider the tra c
ﬂow equation @tu + @xf(u) = 0, u(0, x) = ( 0.8, 1  x  2 0 else , where f(u) = 1 2 u(1 u), F = 0, H = 0, = 0, k = 0.5. Figure: Left: a comparison with the exact entropy solution at t = 1; middle: the numerical solution via solving the control problem; right: the numerical solution to the conservation law using Lax–Friedrichs scheme. 40

Example 2: Tra c flows transport problems We consider the
tra c flow equation with f(u) = u(1 u), = 0.1, F = 0, c = 0.5. The final cost functional H(u(1, ·)) = µ Z ⌦ u(1, x) log( u(1, x) u1 )dx, µ = 1. We set u0 = 0.001 + 0.9e 10(x 2)2 , u1 = 0.001 + 0.45e 10(x 1)2 + 0.45e 10(x 3)2 . We also compare the result from the control of conservation law with a mean-field game problem, i.e., f = 0. 41

Example 2: Tra c flows transport problems 0 2 4
x 0 0.5 1 0 2 4 x 0 0.2 0.4 0.6 0 2 4 x 0 0.2 0.4 0.6 0 2 4 x 0 0.2 0.4 0.6 Figure: From left to right: initial configurations of u0 , u1 , solution u(1, x) for the control of conservation law, solution u(1, x) for the mean-field game problem. Figure: Left: solution u(t, x) for the problem of controlling the conservation law; right: solution u(t, x) for the mean-field game problem. 42

Example 3: Tra c control Consider the tra c flow
equation with f(u) = u(1 u), = 10 3,F = ↵ R ⌦ u log(u)dx, ↵ 0. The final cost functional H(u(1, ·)) = R ⌦ u(1, x)g(x)dx. The initial density and final cost function are as follows u0(x) = ( 0.4 0.5  x  1.5 10 3 else , g(x) = 0.1 sin(2⇡x). Figure: Solution u(t, x) for the problem of controlling the conservation law. From left to right: ↵ = 0, 0.5, 1. 43

Example 3: Tra c control 0 2 4 x -0.2
0 0.2 0.4 u 0 g 0 2 4 x 0 0.5 1 =0 =0.5 =1 0 0.5 1 t 0 0.5 1 =0 =0.5 =1 Figure: Left: boundary conditions for the control problems u0 . Middle: solution u(1, x) for the problem of controlling the conservation law. Right: the numerical Hamiltonian HG(u, ). Figure: Solution u(t, x) for the problem of controlling the conservation law. From left to right: ↵ = 0, 0.5, 1. 44

Extending to systems of conservation law Figure: Control 1D compressible
Navier-Stokes equations. Top: no control. Bottom: with control. 45

Lecture note in Oberwolfach Seminar 2023

Lecture note in Oberwolfach Seminar 2023

More Decks by Wuchen Li

Featured

Transcript