Slide 1

Slide 1 text

Generalized optimal transport: optimization and mean field control in probability spaces Wuchen Li University of South Carolina, Columbia Oberwolfach seminar Nov 22, 2023. 1

Slide 2

Slide 2 text

Generalized optimal transport I Optimization in probability space. – Gradient flows; – Accelerated flows; – Newton’s flows; – Convexity analysis; I Optimal control in probability space: – Hamiltonian flows: – Mean field control; – Mean field games; I Applications: – Regularized Wasserstein proximal operators; – Mean field modeling of pandemics propagation; – Scientific computing: Deep JKO schemes; 2

Slide 3

Slide 3 text

Topic I: Optimization in probability space I Accelerated information gradient flows. I Information Newton’s flow: second-order optimization method in probability space. I Interacting Langevin Di↵usions: Gradient Structure and Ensemble Kalman Sampler. 3

Slide 4

Slide 4 text

Sampling problems 4

Slide 5

Slide 5 text

Examples Figure: Results from Wasserstein GAN. 5

Slide 6

Slide 6 text

Optimization I Optimization problems for sampling algorithms attract increasing attentions from AI and machine learning communities. I Examples are given in model (finite di↵erence, finite element, neural networks, kernels) and model free ones: Variational inference, Bayesian inference, Generative Adversary Networks (GAN), and policy optimizations, etc. 6

Slide 7

Slide 7 text

Metric 7

Slide 8

Slide 8 text

Sampling Algorithm Arguably, one of the most classical sampling algorithm is designed by the over-damped Langevin dynamics: dXt = rf(Xt)dt + p 2dBt, where Xt 2 Rd and Bt is the standard Brownian motion in Rd. Denote Xt ⇠ ⇢t . The evolution of ⇢t satisfies the Fokker-Planck equation @t⇢t = r · (⇢t rf) + ⇢t. The invariant distribution satisfies ⇢ ⇤ (x) = 1 K e f(x) , K = Z e f(y) dy. 8

Slide 9

Slide 9 text

Optimization in probability space Nowadays, the sampling algorithm relates to the variational problem in probability density space P(⌦) = n ⇢ 2 F(⌦): Z ⌦ ⇢dx = 1, ⇢ 0. o . Consider the optimization problem therein: min ⇢2P(⌦) E(⇢). A typical choice of objective functional in Bayesian inference problem is the KL divergence: E(⇢) = DKL(⇢ke f ) = Z ⇢ log ⇢ e f dx. 9

Slide 10

Slide 10 text

Information Metrics Typical examples include Fisher or Wasserstein information metrics. Information geometry I Fish information statistics (Amari, Ay, Zhang et.al.); I AI Inference problems: f–divergence, Amari ↵–divergence etc; I AI optimization methods: Natural gradient (Amari); Stochastic relaxation (Pistone, Malago), et.al. Transport geometry I Gradient descent: (Jordan, Kinderlehrer, Otto, Villani, Liu’s group, Slepcev, Carillo, Cances, Wang, Craig et.al.); I Applied math: (Osher, Rongjie, Preye, Solomon, Oberman, Benamou, Enquist, Yin, Stuart, et.al.); I Engineering: (Chen, Pavon, Georgiou, Tannenbaum, Guo et.al.); I AI Inference problems: (Cuturi et.al., Frogner et.al. 2015, Bottou et.al. 2017, Yau, Gu et.al 2019); I Optimization (Winsolo). 10

Slide 11

Slide 11 text

Gradient descent The Gradient flow of E(⇢) in a metric space satisfies @t⇢t = G(⇢t) 1 E(⇢t) ⇢t . where G is the metric (precondioner) operator, which is often assume to be positive definite. Example I Fisher-Rao gradient flow of KL divergence leads to @t⇢t = grad F DKL(⇢t k⇢ ⇤ ) = ⇢(log ⇢ + f Z ⇢(log ⇢ + f)dx). I Wasserstein gradient flow of KL divergence satisfies @t⇢t = grad W DKL(⇢t k⇢ ⇤ ) =r · (⇢t r(f + log ⇢t + 1)) =r · (⇢t rf) + ⇢t. 11

Slide 12

Slide 12 text

First-order algorithms Sampling methods are first-order methods in probability space based on di↵erent choices of metrics. They are usually “mean field” dynamics. I Stein metric and Stein variational gradient descent1; I Wasserstein-Kalman metric and Ensemble Kalman sampling2; I Wasserstein Newton’s flow. 1Liu and Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. NIPS, 2016. 2Garbuno-Inigo, Ho↵mann, Li, and Stuart. Interacting Langevin di↵usions: Gradient structure and ensemble Kalman sampler. SIAM applied dynamical system, 2019. 12

Slide 13

Slide 13 text

Related works I KL divergence+Wasserstein metric + Gradient descent = Langevin dynamics (Optimal transport, Jordan, Kinderlehrer, Otto, Villani et.al.) I KL divergence+ Fisher-Rao metric+Gradient descent=Birth-death dynamics (Information geometry, Amari, et. al.) I KL divergence +Stein metric + Gradient descent = Stein variational gradient descent (Liu et.al.) I KL divergence+ Wasserstein-Kalman metric + Gradient descent = Ensemble Kalman sampling (Inigo, Ho↵man, Li, Stuart et.al.) Divergence+Metric + Accelerated gradient flows= ? 13

Slide 14

Slide 14 text

Goal: Accelerated Langevin dynamics Can we design systematic accelerated gradient flows for samplings, which applies the symplectic structures in information metrics? I Extending from the fact that optimal transport gradient flow of KL divergences forms Langevin dynamics, what are accelerated gradient flows? I What is the “Mean-field” accelerated Langevin dynamics? 14

Slide 15

Slide 15 text

Related works Finite dimensional I Euclidean space (Su. et.al); I Variational formulation (Wibisono, Wilson and Jordan); I Hamiltonian descent (C. Maddison, D. Paulin, Y. Teh, B. O’Donoghue, A. Doucet); Density space I Liu et al. (ICML 2019): acceleration framework of ParVI methods based on manifold optimization; I Taghvaei and Mehta (ICML 2019): the accelerated flow from an optimal control perspective; I Carrilo et.al. (CMP 2019) study the compressible Euler equation; I Cheng et al. and Ma et al.: underdamped Langevin dynamics. 15

Slide 16

Slide 16 text

Gradient flows I The methods for optimization problems in the Euclidean space have been well-studied. min x2Rn f(x). I The classical gradient descent method follows xk+1 = xk srf(xk). It can be formulated into a continuous time gradient flow, ˙ x = rf(x). 16

Slide 17

Slide 17 text

Accelerate Gradient flows I The Nesterov’s accelerated method is widely applied in accelerating the vanilla gradient descent. It corresponds to the so-called accelerated gradient flow ¨ x + ↵t ˙ x + rf(x) = 0. I This is equivalent to a damped Hamiltonian system  ˙ x ˙ p +  0 ↵tp  0 I I 0  rxHE (x, p) rpHE (x, p) = 0, H E (x, p) = 1 2 kpk2 2+f(x). with initial values x(0) = x0 and p(0) = 0. 1Su et. al. A di↵erential equation for modeling Nesterov accelerated gradient method: Theory and insights. Journal of Machine Learning Research, 2016. 17

Slide 18

Slide 18 text

A quick review on Information Metrics Definition (Metric in the probability space) A metric tensor G(⇢) : T⇢ P(⌦) ! T ⇤ ⇢ P(⌦) is an invertible mapping from the tangent space at ⇢ to the cotangent space at ⇢. This metric tensor defines the metric (inner product) on the tangent space T⇢ P(⌦). Namely, for 1, 2 2 T⇢ P(⌦), we define g⇢( 1, 2) = Z 1G(⇢) 2dx = Z 1G(⇢) 1 2dx, where i is the solution to i = G(⇢) 1 i , i = 1, 2. 18

Slide 19

Slide 19 text

Example: Fisher-Rao metric I The Fisher-Rao metric comes from Information Geometry. I The inverse of the Fisher-Rao metric tensor is defined by G F (⇢) 1 = ⇢ ✓ Z ⇢dx ◆ , 2 T ⇤ ⇢ P(⌦). I For 1, 2 2 T⇢ P(⌦), the Fisher-Rao metric on the tangent space follows g F ⇢ ( 1, 2) = Z 1 2⇢dx ✓Z 1⇢dx ◆ ✓Z 2⇢dx ◆ , where i is the solution to i = ⇢ i R i⇢dx , i = 1, 2. 19

Slide 20

Slide 20 text

Example: Wasserstein metric I The Wasserstein metric comes from Optimal Transport. I The inverse of the Wasserstein metric tensor is defined by G W (⇢) 1 = r · (⇢r ), 2 T ⇤ ⇢ P(⌦). I The Wasserstein metric on the tangent space is given by g W ⇢ ( 1, 2) = Z ⇢ hr 1, r 2 i dx, 1, 2 2 T⇢ P(⌦), where i is the solution to i = r · (⇢r i), i = 1, 2. 20

Slide 21

Slide 21 text

Gradient flows I The gradient flow for E(⇢) in (P(⌦), g⇢) takes the form @t⇢t = G(⇢t) 1 E(⇢t) ⇢t . I The Fisher-Rao gradient flow is given by @t⇢t = ⇢t ✓ E ⇢t Z E ⇢t ⇢tdx ◆ . I The Wasserstein gradient flow writes @t⇢t = r · ✓ ⇢t r E ⇢t ◆ . 21

Slide 22

Slide 22 text

Hamiltonian flows I The Euler-Lagrange equation in the probability space can be formulated as a system of (⇢t, t), i.e., 8 > < > : @t⇢t G(⇢t) 1 t = 0, @t t + 1 2 ⇢t ✓Z tG(⇢t) 1 tdx ◆ + E ⇢t = 0, where t is up to a spatially-constant function shrift. I Here the above PDE system is the Hamiltonian flow @t  ⇢t t  0 1 1 0 " ⇢t H(⇢t, t) t H(⇢t, t) # = 0, with respect to the Hamiltonian H(⇢t, t) = 1 2 Z tG(⇢t) 1 tdx + E(⇢t). 22

Slide 23

Slide 23 text

Example: Fisher-Rao Hamiltonian flow I The Fisher-Rao Hamiltonian flow follows 8 > > < > > : @t⇢t t⇢t + Z t⇢tdx = 0, @t t + 1 2 2 t ✓Z ⇢t tdx ◆ t + E ⇢t = 0. I The corresponding Hamiltonian is HF (⇢t, t) = 1 2 Z 2 t ⇢tdx ✓Z ⇢t tdx ◆2 ! + E(⇢t). 23

Slide 24

Slide 24 text

Example: Wasserstein Hamiltonian flow I The Wasserstein Hamiltonian flow writes 8 < : @t⇢t + r · (⇢t r t) = 0, @t t + 1 2 kr t k2 + E ⇢t = 0. I The corresponding Hamiltonian forms HW (⇢t, t) = 1 2 Z kr t k2 ⇢tdx + E(⇢t). I This is identical to the dynamics in optimal transport and Mean field games. 24

Slide 25

Slide 25 text

Accelerate Information Gradient flow I Let ↵t 0 be a scalar function of t. We add a damping term ↵t t to the Hamiltonian flow. @t  ⇢t t +  0 ↵t t  0 1 1 0 " ⇢t H(⇢t, t) t H(⇢t, t) # = 0. I This renders the Accelerated Information Gradient (AIG) flow 8 > < > : @t⇢t G(⇢t) 1 t = 0, @t t + ↵t t + 1 2 ⇢t ✓Z tG(⇢t) 1 tdx ◆ + E ⇢t = 0, (AIG) with initial values ⇢0 = ⇢0 and 0 = 0. 25

Slide 26

Slide 26 text

The choice of ↵t The choice of ↵t depends on the geodesic convexity of E(⇢), which has an equivalent definition as follows. Definition For a functional E(⇢) defined on the probability space, we say that E(⇢) has a -positive Hessian (in short, Hess( )) w.r.t. the metric g⇢ if there exists a constant 0 such that for any ⇢ 2 P(⌦) and any 2 T⇢ P(⌦), we have g⇢(Hess E(⇢) , ) g⇢( , ). Here Hess is the Hessian operator w.r.t. g⇢ . I If E(⇢) is Hess( ) for > 0, then ↵t = 2 p . I If E(⇢) is Hess(0), then ↵t = 3/t. 26

Slide 27

Slide 27 text

Fisher-Rao AIG flow I The Fisher-Rao AIG flow writes 8 > > < > > : @t⇢t t⇢t + ✓Z t⇢tdx ◆ ⇢t = 0, @t t + ↵t t + 1 2 2 t ✓Z ⇢t tdx ◆ t + E ⇢t = 0. (F-AIG) 27

Slide 28

Slide 28 text

Wasserstein AIG flow I The Wasserstein AIG flow in the Eulerian formulation follows 8 < : @t⇢t + r · (⇢t r t) = 0, @t t + ↵t t + 1 2 kr t k2 + E ⇢t = 0. (W-AIG) I The Wasserstein AIG flow in the Lagrangian formulation leads to 8 > < > : dXt = Vtdt, dVt = ↵tVtdt r ✓ E ⇢t ◆ (Xt)dt. I This is a particular type of initial-valued damped mean-field game systems, known in fluid dynamics (Carillo, 2020, Amir, et.al. 2019.). I Many other generalizations of AIG flows are provided in (Wang, Li, 2019). 28

Slide 29

Slide 29 text

Examples in Gaussian families Theorem Suppose that ⇢0, ⇢⇤ 2 N n 0 and their covariance matrices are ⌃0 and ⌃⇤ . E(⌃) evaluates the KL divergence from ⇢ to ⇢⇤ . Here (⌃t, St) satisfies ( ˙ ⌃t (St⌃t + ⌃tSt) = 0, ˙ St + ↵tSt + S2 t + 2r⌃t E(⌃t) = 0, (W-AIG-G) with initial values ⌃0 = ⌃0 and S0 = 0. Especially, for any t 0, ⌃t is well-defined and stays positive definite. 29

Slide 30

Slide 30 text

Convergence rate of W-AIG flows Theorem Suppose that E(⇢) satisfies Hess( ) for > 0. The solution ⇢t to (W-AIG) with ↵t = 2 p satisfies E(⇢t)  C0e p t = O(e p t ). If E(⇢) only satisfies Hess(0), then the solution ⇢t to (W-AIG) with ↵t = 3/t satisfies E(⇢t)  C 0 0t 2 = O(t 2 ). Here the constants C0, C 0 0 only depend on ⇢0. 30

Slide 31

Slide 31 text

PDE convexities Convergence rate I Hess( ) is equivalent to the -geodesic convexity in the probability space w.r.t. g⇢ . For the Wasserstein metric, it is also known as -displacement convexity. I If E(⇢) is the KL divergence and the target density takes the form ⇢ ⇤ / exp( f(x)). – A su cient condition for Hess( ) is that f(x) is -strongly convex. – The W-AIG flow improves the O(e 2 t) convergence rate of the gradient flow to O(e p t), especially when is close to 0. Convexities and entropy dissipation I -geodesic convexity=Wasserstein Hessian operator (Otto, Villani et.al.)=Weak form of Gamma calculus (Bakry-Emery et.al.) I Fisher-Rao + density manifold Christo↵el symbol=Weak form of Gamma calculus (Li, 2018). 31

Slide 32

Slide 32 text

Sketch of the proof I In the proof of previous two propositions, we consider ut = @t(Tt) 1 Tt with properties r · (⇢t(ut r t)) = 0, @tTt = rTtut. I We show the following relationship Z hr t, @tTt + rTt r t i ⇢tdx = Z hr t ut, rTt(r t ut)i ⇢tdx 0. This is based on the Hodge decomposition. 32

Slide 33

Slide 33 text

W-AIG flows in the particle level Proposition Suppose that Xt ⇠ ⇢t and Vt = r t(Xt) are the position and the velocity of a particle at time t. Then, the di↵erential equation of the particle system corresponding to (W-AIG) writes ( dXt = Vtdt, dVt = ↵tVtdt rf(Xt)dt r log ⇢t(Xt)dt. 33

Slide 34

Slide 34 text

Discrete-time algorithm for W-AIG flow For i = 1, 2 . . . N, the update rule follows ( V i k+1 = ↵kV i k p ⌧(rf(X i k) + ⇠k(X i k)), X i k+1 = X i k + p ⌧V i k+1. If E(⇢) is Hess( ), then ↵k = 1 p ⌧ 1+ p ⌧ ; if E(⇢) is Hess(0) or is unknown, then ↵k = k 1 k+2 . Here ⇠k(x) is an approximation of r log ⇢t(x). Di culties I The approximation of r log ⇢t(x). I The sti↵ness of AIG flows. 34

Slide 35

Slide 35 text

Reviews on the choices of ⇠k I If Xi k follows a Gaussian distribution, then ⇠k(x) = ⌃ 1 k (x mk), where mk and ⌃k are the mean and the covariance of {Xi k }N i=1 . I For the non-Gaussian case, we use the kernel density estimation (KDE1). ⇠k writes ⇠k(x) = r log ˜ ⇢k(x) = PN i=1 rxK(x, Xi k) PN i=1 K(x, Xi k) . A common choice of K(x, y) is a Gaussian kernel with the bandwidth h, KG (x, y; h) = (2⇡h) n/2 exp kx yk2 /(2h) . 1Radhey S Singh. Improvement on some known nonparametric uniformly consistent estimators of derivatives of a density. The Annals of Statistics, pp. 394399, 1977. 35

Slide 36

Slide 36 text

Selection of the bandwidth I SVGD1 uses a median (MED) method to choose the bandwidth, i.e., hk = 1 2 log(N + 1) median ⇣ {kX i k X j k k2}N i,j=1 ⌘ . I Liu et al.2 propose a Heat Equation (HE) method to adaptively adjust bandwidth. I Motivated by the HE method, we introduce the Brownian motion (BM) method to adaptively learn the bandwidth. 1Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in neural information processing systems, pp. 23782386, 2016. 2Chang Liu, Jingwei Zhuo, Pengyu Cheng, Ruiyi Zhang, Jun Zhu, and Lawrence Carin. Accelerated first-order methods on the Wasserstein space for bayesian inference. arXiv preprint arXiv:1807.01750, 2018. 36

Slide 37

Slide 37 text

The BM method I Given the bandwidth h, {Xi k }N i=1 and a step size s, we can compute two particle systems Y i k (h) = X i k s⇠k(x; h), Z i k = X i k + p 2sB i , i = 1, . . . N, where Bi is the standard Brownian motion. I Minimize MMD(ˆ ⇢Y , ˆ ⇢Z), the maximum mean discrepancy between the empirical distribution ˆ ⇢Y (x) = PN i=1 (x Y i k (h)) and ˆ ⇢Z(x) = PN i=1 (x Zi k). I Optimize over h, using the bandwidth hk 1 from the last iteration as the initialization. 37

Slide 38

Slide 38 text

Adaptive restart I The second equation in (W-AIG) is the Hamilton-Jacobi equation, which usually has the strong sti↵ness. We propose an adaptive restart technique to deal with this problem. Consider 'k = N X i=1 ⌦ V i k+1, rf(X i k) + ⇠k(X i k) ↵ , which can be viewed as discrete-time approximation of @tE(⇢t). I If 'k < 0, then we restart the algorithm with initial values Xi 0 = Xi k and V i 0 = 0. I This essentially keeps @tE(⇢t) negative along the trajectory. I The numerical results show that the adaptive restart accelerates and stabilizes the discrete-time algorithm. 38

Slide 39

Slide 39 text

Overall algorithm Algorithm 1 Particle implementation of Wasserstein AIG flow Require: initial positions {Xi 0} N i=1 , step size ⌧, number of iteration L. 1: Set k = 0, V i 0 = 0, i = 1, . . . N. Set the bandwidth h0 by MED. 2: for l = 1, 2, . . . L do 3: Compute hl based on BM: hl = BM(hl 1 , {Xi k} N i=1 , p⌧). 4: Calculate ⇠k (Xi k ) with the bandwidth hl . 5: Set ↵k based on whether E(⇢) is Hess( ) or Hess(0). Update V i k+1 = ↵k V i k p⌧(rf(Xi k ) + ⇠k (Xi k )), Xi k+1 = Xi k + p⌧V i k+1 . 6: if RESTART then 7: Compute 'k = P N i=1 ⌦ V i k+1 , rf(Xi k ) + ⇠k (Xi k ) ↵ . 8: If 'k < 0, set Xi 0 = Xi k and V i 0 = 0 and k = 0; otherwise set k = k+1. 9: else 10: Set k = k + 1. 11: end if 12: end for 39

Slide 40

Slide 40 text

Toy example I The target density ⇢ ⇤ is a toy bimodal distribution1. I We compare two types of particle implementations of the Wasserstein gradient flow over the KL divergence: X i k+1 = X i k ⌧rf(X i k)+ p 2⌧B i k. X i k+1 = X i k ⌧rf(X i k) ⌧⇠k(X i k). Here Bi k ⇠ N(0, 1) is the standard Brownian motion and ⇠k is computed by KDE. The first method is known as the MCMC method. The second method is called the ParVI method, whose bandwidth h is selected by MED/HE/BM respectively. 1Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, 2015. 40

Slide 41

Slide 41 text

Toy example 41

Slide 42

Slide 42 text

Gaussian examples: ODE level I The initial value is set to be ⌃0 = I. I The following figure presents the convergence of the KL divergence on two target distributions with small/large L. Figure: The acceleration of W-AIG flow and the e↵ect of adaptive restart (ODE level). The target density is a Gaussian distribution with zero mean on R100. Left: L = 1,  ⇡ 3.8 ⇥ 103. Right: = 1,  ⇡ 4.0 ⇥ 103. 42

Slide 43

Slide 43 text

Gaussian examples: Particle level I The initial distribution of samples follows N(0, I) and the number of samples is N = 600. I For a particle system {Xi k }N i=1 , we record the KL divergence E(ˆ ⌃k) using the empirical covariance matrix ˆ ⌃k . Figure: The acceleration of W-AIG flow and the e↵ect of adaptive restart (particle level). The setting of the target density is identical to the previous figure. 43

Slide 44

Slide 44 text

Bayesian logistic regression Figure: Comparison of di↵erent methods on Bayesian logistic regression, averaged over 10 independent trials. The shaded areas show the variance over 10 trials. Top: BM Bottom: MED. Left: Test accuracy; Right: Test log-likelihood. 44

Slide 45

Slide 45 text

Topic II: Optimal control in probability space I Computational mean-field information dynamics associated with reaction-di↵usion equations. I Controlling conservation laws I, II: entropy-entropy flux. I Mean field games. I Master equations for finite state mean field games with nonlinear activations. 45

Slide 46

Slide 46 text

Pandemic evolution 2

Slide 47

Slide 47 text

Vaccine distribution 3

Slide 48

Slide 48 text

Generalized unnormalized OT 4

Slide 49

Slide 49 text

Reaction di↵usion Consider a scalar nonlinear reaction di↵usion equation by @tu(t, x) = F(u(t, x)) + R(u(t, x)), where x 2 ⌦, ⌦ ⇢ Rd is a compact convex spatial domain, u 2 M(⌦) = {u 2 C1(⌦): u 0}, and is the Euclidean Laplacian operator. 5

Slide 50

Slide 50 text

Reaction di↵usions and generalized optimal transport In this talk, we first review reaction di↵usions with optimal transport type metric space. Using this connection, we further design mean field control problems for reaction-di↵usion. This also provides variational schemes for solving classical reaction-di↵usion in both time and spatial domains. 6

Slide 51

Slide 51 text

Entropy, Information, Control 7

Slide 52

Slide 52 text

Literature The optimal transport problem was first introduced by Monge in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. I Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; I Lyapunov methods and Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; I Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Larsy, Lions, Carmona, Souganidis, Georgiou, Nelson, La↵erty, et.al. I Numerical OT and MFG: Benamou, Nurbekyan, Oberman, Osher, Achdou, Caines, Tamer, Lai, Lauriere, et.al. 8

Slide 53

Slide 53 text

Mean field control for reaction di↵usions? 9

Slide 54

Slide 54 text

Lyapunov methods Consider a Lyapunov functional G(u) = Z G(u(x))dx, where G: R ! R is a given convex function with G00(u) > 0. In fact, how to choose entropy G can be given from entropy-entropy flux-metric condition. We have seen it in Prof. Osher’s talk. 10

Slide 55

Slide 55 text

Metrics Along the solution of reaction di↵usion, one observes that d dtG(u(t, ·)) = Z G0(u) · @tudx = Z G0( F + R)dx = Z ⇣ rG0, rF ⌘ dx + Z G0Rdx = Z ⇣ rG0, ru ⌘ F0dx + Z G0Rdx = Z ⇣ rG0, ru ⌘ G00 F0(u) G00 dx + Z G02 R G0 dx = Z ⇣ rG0, rG0 ⌘ F0 G00 dx + Z G0(u)2 R G0 dx, where we apply integration by parts in the third equality and rG0(u) = G00(u)ru in the last equality. 11

Slide 56

Slide 56 text

Lyapunov methods induced metrics I Denote the smooth positive probability space by M = n u 2 C1(⌦): u > 0 o . Given F, G: R ! R satisfying F0(u) > 0 if u > 0, and G00(u) > 0. Denote V1(u) = F0(u) G00(u) , V2(u) = R(u) G0(u) . Denote the tangent space of M at u 2 M by TuM = n 2 C1(⌦) o . 12

Slide 57

Slide 57 text

Lyapunov methods induced metrics II Definition (Mean-field information metric) The inner product g(u): TuM ⇥ TuM ! R is given below. For any 1 , 2 2 TuM, define g(u)( 1, 2) = Z ⌦ (r 1, r 2)V1(u)dx + Z ⌦ 1 2V2(u)dx, where i = r · (V1(u)r i) + V2(u) i, i = 1, 2. 13

Slide 58

Slide 58 text

Reaction-di↵usions in metric spaces Given an energy functional F : M ! R, the gradient flow of F in (M(⌦), g) satisfies @tu(t, x) = r · (V1(u)r uF(u))(t, x) V2(u) uF(u)(t, x). In particular, if F(u) = G(u) = R G(u)dx, then @tu =r · (V1(u)rG0(u)) V2(u)G0(u) =r · ( F0(u) G00(u) G00(u)ru) + R(u) G0(u) G0(u) =r · (F0(u)ru) + R(u) = F(u) + R(u). 14

Slide 59

Slide 59 text

Controlling reaction di↵usions 15

Slide 60

Slide 60 text

Controlling reaction di↵usions Denote a given energy functional by F : M(⌦) ! R. Consider a variational problem by inf v1,v2,u Z 1 0 h Z ⌦ 1 2kv1k 2V1(u) + 1 2|v2| 2V2(u)dx F(u) i dt where the infimum is taken among all density functions u: [0, 1] ⇥ ⌦ ! R, vector fields v1 : [0, 1] ⇥ ⌦ ! Rd, and reaction rate functions v2 : [0, 1] ⇥ ⌦ ! R, such that @tu(t, x) + r · (V1(u(t, x))v1(t, x)) = v2(t, x)V2(u(t, x)), with fixed initial and terminal density functions u0 , u1 2 M(⌦). 16

Slide 61

Slide 61 text

Mean field information dynamics The critical point of variational problem satisfies v1(t, x) = r (t, x), v2(t, x) = (t, x), with 8 > > > > < > > > > : @tu(t, x) + r · (V1(u(t, x))r (t, x)) = V2(u(t, x)) (t, x) @t (t, x) + 1 2kr (t, x)k 2V 0 1 (u(t, x)) + 1 2| (t, x)| 2V 0 2 (u(t, x)) + uF(u)(t, x) = 0. 17

Slide 62

Slide 62 text

Mean field information Hamiltonian There exists a Hamiltonian formulations for the control problem. 8 > < > : @tu = H(u, ), @t = uH(u, ), where H is the Hamiltonian functional defined by H(u, ) = 1 2 Z ⌦ kr k 2V1(u)dx + 1 2 Z ⌦ | | 2V2(u)dx + F(u). 18

Slide 63

Slide 63 text

Example I Example (Wasserstein metric, heat flow and density Hamilton flows) Let G(u) = u log u u, F(u) = u, R(u) = 0, thus V1(u) = F0(u) G00(u) = u, V2(u) = R(u) G0(u) = 0. Then the mean-field information metric forms g(u)( 1, 2) = Z ⌦ (r 1(x), r 2(x))u(x)dx. Here the Hamiltonian flow in (M(⌦), g) satisfies 8 < : @tu + r · (ur ) = 0 @t + 1 2kr k 2 + uF(u) = 0. 19

Slide 64

Slide 64 text

Example II Example (Fisher-Rao metric, Birth-death equation and density Hamiltonian flows) Consider V1(u) = F0(u) G00(u) = 0, V2(u) = R(u) G0(u) = u. Then the mean-field information metric satisfies g(u)( 1, 2) = Z ⌦ 1(x) 2(x)u(x)dx. Here the Hamiltonian flow in (M(⌦), g) satisfies 8 < : @tu u = 0 @t + 1 2| | 2 + uF(u) = 0. 20

Slide 65

Slide 65 text

Example III Example (Fisher-KPP metric, Fisher-KPP equation and density Hamiltonian flows) Consider F(u) = u, G(u) = u log u u, R(u) = u(1 u). Thus V1(u) = F0(u) G00(u) = u, V2(u) = R(u) G0(u) = u(u 1) log u . Here the Hamiltonian flow in (M(⌦), g) satisfies 8 > > < > > : @tu + r · (ur ) u(u 1) log u = 0 @t + 1 2kr k 2 + | | 2 (2u 1) log u + 1 u (log u)2 + uF(u) = 0. 21

Slide 66

Slide 66 text

Primal-dual hybrid gradient algorithms Consider the following convex optimization problem. min z f(Az) + g(z), where z is a variable to be minimized, f and g are convex functions and A is a continuous linear operator. The algorithm solves the problem by iterating z (k+1) = argmin z L(z, ¯ p) + 1 2⌧ kz z (k)k2 L 2 p (k+1) = argmax p L(z (k+1) , p) 1 2 kp p (k)k2 L 2 ¯ p = 2p (k+1) p (k) , (1) where p is a dual variable, L is a Lagrangian functional L(z, p) = g(z) + hAz, pi f ⇤(p) and f ⇤(p) = sup z hz, pi f(z) is the Legendre transform of f. 22

Slide 67

Slide 67 text

Primal-dual hybrid gradient methods The scheme converges if the step sizes ⌧ and satisfy ⌧ kAT Ak L 2 < 1, where k · k L 2 is an operator norm in L 2 . These step sizes are dependent on the operator A. In this talk, we use G-prox PDHG, a variation of the primal-dual algorithm. G-Prox PDHG provides an appropriate choice of norms for the algorithm that allows the algorithm to converge faster than the vanilla PDHG algorithm. 23

Slide 68

Slide 68 text

G-prox primal-dual hybrid gradient methods G-prox PDHG finds the saddle point by iterating z (k+1) = argmin z L(z, ¯ p) + 1 2⌧ kz z (k)k2 L 2 p (k+1) = argmax p L(z (k+1) , p) 1 2 kp p (k)k2 H ¯ p = 2p (k+1) p (k) , where the norm k · k H is defined as kzk2 H = kAT zk2 L 2 . By choosing this proper norm, the step size only needs to satisfy ⌧ < 1, which is independent of an operator A. 24

Slide 69

Slide 69 text

Implementation of the algorithm We set the linear operator A as A(m1, m2, u) = @tu(t, x) + r · m1(t, x) m2(t, x). We set convex functionals f and g below g(m1, m2, u) = Z 1 0 Z ⌦ km1(t, x)k2 2V1(u(t, x)) + |m2(t, x)|2 2V2(u(t, x))dx F(u) dt f (A(m1, m2, u)) = ( 0 if A(m1, m2, u) = 0 1 otherwise f ⇤( ) = 0. Thus, we have the Lagrangian functional L(m1, m2, u, ) = Z 1 0 Z ⌦ n km1(t, x)k2 2V1(u(t, x)) + |m2(t, x)|2 2V2(u(t, x)) + (t, x) ⇣ @tu(t, x) + r · m1(t, x) m2(t, x) ⌘o dx F(u) dt. 25

Slide 70

Slide 70 text

Implementation of the algorithm We find the saddle point (m ⇤ 1, m ⇤ 2, u ⇤ , ⇤) by iterating the following schemes: u (k+1) = argmin u L(m (k) 1 , m (k) 2 , u, ¯) + 1 2⌧ ku u (k)k2 L 2 m (k+1) 1 = argmin m L(m, m (k) 2 , u (k) , ¯) + 1 2⌧ km m (k) 1 k2 L 2 m (k+1) 2 = argmin m L(m (k) 1 , m, u (k) , ¯) + 1 2⌧ km m (k) 2 k2 L 2 (k+1) = argmax L(m (k+1) 1 , m (k+1) 2 , u (k+1) , ) 1 2 k (k)k2 H ¯ = 2 (k+1) (k) . Here, L 2 and H norms are defined as kuk2 L 2 = (u, u) L 2 = Z 1 0 Z ⌦ kuk2 dx dt, kuk2 H = kAT uk2 L 2 for any u : [0, 1] ⇥ ⌦ ! Rn. 26

Slide 71

Slide 71 text

Implementation of the algorithm We can find the explicit formula of u (k+1) , m (k+1) 1 , m (k+1) 2 , (k+1) using the first-order optimality conditions. The optimality conditions of m (k+1) 1 , m (k+1) 2 , (k+1) result in first-order equations which can be solved easily. Thus, we have m (k+1) 1 = V1(u (k)) ⌧ + V1(u(k)) ⇣ m (k) 1 + ⌧r (k) ⌘ m (k+1) 2 = V2(u (k)) ⌧ + V2(u(k)) ⇣ m (k) 2 + ⌧ (k) ⌘ (k+ 1 2 ) = (k) + (AAT ) 1 ⇣ @tu (k+1) + r · m (k+1) 1 m (k+1) 2 ⌘ 27

Slide 72

Slide 72 text

Implementation of the algorithm The optimality condition of u (k+1) involves the nonlinear equation, which we compute using Newton’s method. For j = 1, 2, · · · a = km1k2 2V 2 1 (u j ) V 0 1 (uj) |m2|2 2V 2 2 (u j ) V 0 2 (uj) @ t + c log uj + 1 ⌧ (uj u(k)) b = km1k2 V 3 1 (u j ) (V 0 1 (uj))2 km1k2 2V 2 1 (u j ) V 00 1 (uj) + |m2|2 V 3 2 (u j ) (V 0 2 (uj))2 |m2|2 2V 2 2 (u j ) V 00 2 (uj) + c u j + 1 ⌧ uj+1 = max(0, uj a b ) 28

Slide 73

Slide 73 text

Algorithm Algorithm: G-prox PDHG for mean-field information variational problem Input: Initial density u0 and terminal density u1. Output: u, , m2 : [0, T] ⇥ ⌦ ! R, m1 : [0, T] ⇥ ⌦ ! Rd. For k = 1, 2, · · · u(k+1) Compute u(k+1) using Newton’s method. m(k+1) 1 = V1(u (k)) ⌧+V1(u (k)) ⇣ m(k) 1 + ⌧r (k) ⌘ m(k+1) 2 = V2(u (k)) ⌧+V2(u (k)) ⇣ m(k) 2 + ⌧ (k) ⌘ (k+ 1 2 ) = (k) + (AAT ) 1 ⇣ @ t u(k+1) + r · m(k+1) 1 m(k+1) 2 ⌘ (k+1) = 2 (k+ 1 2 ) (k) 29

Slide 74

Slide 74 text

Example We present a numerical experiment using the Algorithm with di↵erent V1 , V2 functions. We wrote C++ codes to run the numerical experiments. For all the experiments, we used the discretization sizes Nx1 = Nx2 = 128, Nt = 30. V1(u) = u, V2(u) = u↵, ↵ > 0. We show how varying the exponent ↵ a↵ects the evolution of the densities. Below are the initial and terminal densities used in the simulations. u0(0, x) = 10 exp 60 (x 0.3)2 + (y 0.7)2 + 1, u1(0, x) = 20 exp 60 (x 0.7)2 + (y 0.3)2 + 1. 30

Slide 75

Slide 75 text

Figure: Example: Snapshots of the densities at di↵erent times t. The first, the second, the third and the fourth rows represent V2(u) = u, V2(u) = u 2 , V2(u) = u 3 , V2(u) = u 4 , respectively. 31

Slide 76

Slide 76 text

Figure: Example: Cross sections along a line x + y = 1 for di↵erent time t. Each color represents V2(u) = u, V2(u) = u 2 , V2(u) = u 4 . 32

Slide 77

Slide 77 text

Topic II: Optimal control in probability space I Computational mean-field information dynamics associated with reaction-di↵usion equations. I Controlling conservation laws I, II: entropy-entropy flux. I Mean field games. I Master equations for finite state mean field games with nonlinear activations. 45

Slide 78

Slide 78 text

Tra c in LA 2

Slide 79

Slide 79 text

Mean-field PDE models 3

Slide 80

Slide 80 text

Regularized scalar conservation laws Consider @ t u + r · f(u) = r · (A(u)ru), where the known variable is u = u(t, x), (t, x) 2 R + ⇥ Rd, f : Rd ! Rd is a flux function, A(u) 2 Rd⇥d is a positive semi-definite matrix function, and > 0 is a viscosity constant. The PDE is known as a regularized scalar conservation law. 4

Slide 81

Slide 81 text

Conservation laws and generalized optimal transport In literature, there are actively joint studies to work on entropy, Fisher information, and transportation. Nowadays, these connections have applications in mean field games, information theory, AI, quantum computing, computational graphics, computational physics, and Bayesian inverse problems. In this talk, we connect Lax’s entropy-entropy flux in conservation laws with optimal transport type metric spaces. Following this connection, we further design variational discretizations for conservation laws and mean field control of conservation laws. This includes a new and e cient method for developing unconditionally stable in time implicit approximations to regularized conservation laws and many other initial value problems. 5

Slide 82

Slide 82 text

Entropy, Information, transportation 6

Slide 83

Slide 83 text

Heat equation and Entropy dissipation Consider the heat equation in Rd by @u(t, x) @t = r · (ru(t, x)) = u(t, x). Consider the negative Boltzmann-Shannon entropy given by H(u) = Z Rd u(x)(log u(x) 1)dx. Along the time evolution of heat equation, the following dissipation relation holds: d dtH(u(t, ·)) = Z Rd krx log u(t, x)k 2u(t, x)dx = I(u(t, ·)), where I(u) is the Fisher information functional. There is a formulation behind this relation, namely I The heat flow is a gradient descent flow of entropy in optimal transport metric. 7

Slide 84

Slide 84 text

Optimal transport What is the optimal way to move or transport the mountain with shape X, density u0(x) to another shape Y with density u1(y)? 8

Slide 85

Slide 85 text

Literature The optimal transport problem was first introduced by Monge in 1781 and relaxed by Kantorovich (Nobel prize) in 1940. It defines a distance in the space of probability distributions, named optimal transport, Wasserstein distance, or Earth Mover’s distance. I Mapping/Monge-Amp´ ere equation: Gangbo, Brenier, et.al; I Gradient flows: Otto, Villani, Ambrosio, Gigli, Savare, Carillo, Mielke, et.al; I Hamiltonian flows: Compressible Euler equations, Potential mean field games, Schrodinger bridge problems, Schrodinger equations: Benamou, Brenier, Larsy, Lions, Georgiou, Tallenbum, Nelson, La↵erty, Liu, Gao, et.al. I Numerical OT and Mean field control/MFG: Benamou, Oberman, Osher, Achdou, Yang, Justin, Fu, et.al. I Natural gradient, Mean field control/MFG and Machine learning: Amari, Li, Chen, Montufar, Yang, Liu, Nurbekyan, Chen, Osher, Lars, Zhang, Katsoulakis, Zhou, et.al. 9

Slide 86

Slide 86 text

Entropy dissipation = Lyapunov methods The gradient flow of the negative entropy H(u) = Z Rd u(x)log u(x)dx, w.r.t. optimal transport metric distance satisfies @u @t = r · (urlog u) = u. Here the major trick is that ur log u = ru. In this way, one can study the entropy dissipation by d dtH(u) = Z Rd log ur · (urlog u)dx = Z Rd kr log uk 2udx. 10

Slide 87

Slide 87 text

Lyapunov method induced calculus Informally speaking, Wasserstein-Otto metric refers to the following bilinear form: h ˙ u1, G(u) ˙ u2i = Z ( ˙ u1, ( u ) 1 ˙ u2)dx. In other words, denote u = r · (ur), and ˙ u i = u i = r · (ur i ), i = 1, 2, then h 1, G(u) 1 2i = h 1, r · (ur) 2i = Z (r 1, r 2)udx, where u 2 P(⌦), ˙ u i is the tangent vector in P(⌦) with Z ˙ u i dx = 0, and i 2 C1(⌦) are cotangent vectors in P(⌦) at the point u. Here r·, r are standard divergence and gradient operators in ⌦. 11

Slide 88

Slide 88 text

Gradient flows The Wasserstein gradient flow of an energy functional F(u) leads to @ t u = G(u) 1 uF(u) =r · (ur uF(u)). I If F(u) = R F(x)u(x)dx, then @ t u = r · (urF(x)). I If F(u) = R u(x) log u(x)dx, then @ t u = r · (ur log u) = u. 12

Slide 89

Slide 89 text

Hamiltonian flows Consider the Lagrangian functional L(u, @ t u) = 1 2 Z ⇣ @ t u, ( r · (ur)) 1@ t u ⌘ dx F(u). By the Legendre transform, H(u, ) = sup @tu Z @ t u dx L(u, @ t u). And the Hamiltonian system follows @ t u = H(u, ), @ t = uH(u, ), where u , are L2 first variation operators w.r.t. u, , respectively and the density Hamiltonian forms H(u, ) = 1 2 Z kr k 2udx + F(u). Here u is the “density” state variable and is the “density” moment variable. 13

Slide 90

Slide 90 text

Examples: Optimal transport and mean field games Thus, the Hamiltonian flow satisfies 8 < : @ t u + r · (ur ) = 0 @ t + 1 2kr k 2 = uF(u). This is a well known dynamic, which is studied in potential mean field games, optimal transport and PDE. 14

Slide 91

Slide 91 text

Optimal transport type formulaisms We are currently working on the “universe” of generalized optimal transport and mean field games. I AI theory, inference, and computations; I Quantum computing; I Algorithms of mean-field MCMC algorithms, and numerical PDEs. In this lecture, we point out that there are optimal transport type formalisms to study the control of conservation laws and to design variational numerical schemes. Related works I Godunov (1970); I Lax–Friedrichs (1971); I Brenier (2018); I Osher, Shu, Harten, van Leer: Monotone schemes, TVD schemes, ENO, WENO, et.al. 15

Slide 92

Slide 92 text

Main topic: Controlling conservation laws 16

Slide 93

Slide 93 text

Entropy-Entropy flux pairs Consider @ t u(t, x) + rx · f(u) = 0. where u: R + ⇥ Rn ! Rd, and f : Rd ! Rd. Definition (Entropy-entropy flux pair for systems (Lax)) We call (G, ) an entropy-entropy flux pair for the above conservation law system if there exists a convex function G: Rd ! R, and : Rd ! R, such that 0(u) = G0(u)f0(u). And an entropy solution satisfies @ t G(u) + rx · (u)  0. This is trivial for scalar conservation laws, where d = 1. 17

Slide 94

Slide 94 text

Lyapunov methods: Entropy-Entropy flux Denote G(u) = Z ⌦ G(u)dx. In this case, Z ⌦ G0(u)r · f(u)dx = Z ⌦ G0(u)(f0(u), ru)dx = Z ⌦ ( 0(u), ru)dx = Z ⌦ r · (u)dx = 0, where we apply the fact that f0(u)G0(u) = 0(u) and R ⌦ r · (u)dx = 0. 18

Slide 95

Slide 95 text

Lyapunov methods: Viscosity In addition, Z ⌦ G0(u)r · (A(u)ru)dx = Z ⌦ rG0(u), A(u)ru dx = Z ⌦ rG0(u), A(u)G00(u) 1 rG0(u) dx, where we apply rG0(u) = G00(u)ru, in the last equality. We require that A(u)G00(u) 1 ⌫ 0, and A is nonnegative definite, G00(u) > 0. 19

Slide 96

Slide 96 text

Entropy-Entropy flux-Fisher information dissipation Hence we know that @ tG(u) = Z ⌦ G0(u) · @ t udx = Z ⌦ rG0(u), A(u)G00(u) 1 rG0(u) dx  0. This implies that G(u) is a Lyapunov functional for regularized conservation laws. 20

Slide 97

Slide 97 text

Entropy-Entropy flux-transport metrics Definition (Entropy-entropy flux-metric condition) We call (G, ) an entropy-entropy flux pair-metric for conservation laws if there exists a convex function G: Rd ! R, and : Rd ! R, such that 0(u) = G0(u)f0(u), and A(u)G00(u) 1 is symmetric positive semi-definite. 21

Slide 98

Slide 98 text

Lyapunov methods induced transport metrics Under the entropy-entropy flux-metric condition, there is a metric operator for regularized conservation laws. Define the space of function u by M = n u 2 C1(⌦): Z ⌦ u(x)dx = constant o . The tangent space of M(u) at point u satisfies T uM = n 2 C1(⌦): Z ⌦ (x)dx = 0 o . Denote an elliptic operator LC : C1(⌦) ! C1(R) by LC(u) = r · (A(u)G00(u) 1 r). 22

Slide 99

Slide 99 text

Lyapunov methods induced transport metrics Definition (Metric) The inner product g(u): T uM ⇥ T uM ! R is given below. g(u)( 1, 2) = Z ⌦ ( 1, LC(u) 2)dx = Z ⌦ 1r · (A(u)G00(u) 1 r 2)dx = Z ⌦ (r 1, A(u)G00(u) 1 r 2)dx = Z ⌦ 1 2dx = Z ⌦ 2 1dx, where i 2 C1(⌦) satisfies i = r · (A(u)G00(u) 1 r i ), i = 1, 2. 23

Slide 100

Slide 100 text

Entropy induced Gradient flows Proposition (Gradient flow) Given an energy functional E : M ! R, the gradient flow of E in (M, g) satisfies @ t u = r · (A(u)G00(u) 1 r uE(u)). If E(u) = G(u) = R ⌦ G(u)dx, then the above gradient flow satisfies @ t u = r · (A(u)ru). 24

Slide 101

Slide 101 text

Flux-gradient flows Definition (Flux–gradient flow) Given an energy functional E : M ! R, consider a class of PDE @ t u + rx · f1(x, u) = rx · (A(u)G00(u) 1 rx uE(u)), (1) where f1 : ⌦ ⇥ R1 ! Rn is a flux function satisfying Z ⌦ f1(x, u) · rx u(x)E(u)dx = 0. Remark This is the generalized optimal transport for GENERIC (nonequilibrium reversible-irreversible coupling), Grmela and Ottinger, 1997. 25

Slide 102

Slide 102 text

Example If E(u) = G(u) = R ⌦ G(u)dx and f1(x, u) = f(u), then equation (1) forms regularized conservation laws. @ t u + rx · f(u) = rx · (A(u)G00(u) 1 rx uG(u)) = rx · (A(u)G00(u) 1 rx G0(u)) = rx · (A(u)G00(u) 1G00(u)rx u) = rx · (A(u)ru), and Z ⌦ rx uG(u) · f(u)dx = Z ⌦ rx G0(u) · f(u)dx = Z ⌦ G0(u) · ⇣ rx · f(u) ⌘ dx = Z ⌦ r · (u)dx =0. 26

Slide 103

Slide 103 text

Controlling flux-gradient flows Given smooth functionals F, H: M ! R, consider a variational problem inf u,v,u1 Z 1 0 h Z ⌦ 1 2 v, A(u)G00(u) 1v dx F(u) i dt + H(u1), where the infimum is taken among variables v: [0, 1] ⇥ ⌦ ! Rn, u: [0, 1] ⇥ ⌦ ! R, and u1 : ⌦ ! R satisfying @ t u + r · f(u) + r · (A(u)G00(u) 1v) = r · (A(u)ru), and u(0, x) = u0(x). 27

Slide 104

Slide 104 text

Primal-dual conservation laws Proposition (Hamiltonian flows of conservation laws) The critical point system of the variational problem is given below. There exists a function : [0, 1] ⇥ ⌦ ! R, such that v(t, x) = r (t, x), and 8 > > < > > : @ t u + r · f(u) + r · (A(u)G00(u) 1 r ) = r · (A(u)ru), @ t + (r , f0(u)) + 1 2 (r , (A(u)G00(u) 1)0 r ) + uF(u) = r · (A(u)r ) + (r , A0(u)ru). (2) Here 0 represents the derivative w.r.t. variable u. The initial and terminal time conditions satisfy u(0, x) = u0(x), u1 H(u1) + (1, x) = 0. 28

Slide 105

Slide 105 text

Hamiltonian flows of conservation laws Proposition The primal-dual conservation law system (2) has the following Hamiltonian flow formulation. @ t u = HG(u, ), @ t = uHG(u, ), where we define the Hamiltonian functional HG : M ⇥ C1(⌦) ! R by HG(u, ) = Z ⌦ h1 2 (r , A(u)G00(u) 1 r ) + (r , f(u)) (r , A(u)ru) i dx + F(u). In other words, the Hamiltonian functional HG(u, ) is conserved along Hamiltonian dynamics: d dtHG(u, ) = 0. 29

Slide 106

Slide 106 text

Modeling: Controlling tra c flows Position control. The unknown variable u in tra c flows represents the density function of cars (particles) in a given spatial domain. Here, the background dynamics of u is the classical tra c flow. The control variable is the velocity for enforcing each car’s velocity in addition to its background tra c flow dynamics. The goal is to control the “total enforced kinetic energy” of all cars, in which individual cars can determine their velocities through both noise and tra c flow interactions. 30

Slide 107

Slide 107 text

Controlling Tra c flow problems Consider inf u,v,u1 Z 1 0 h Z ⌦ 1 2kv(t, x)k 2u(t, x)dx F(u) i dt + H(u1), (3a) such that 0  u(t, x)  1, for all t 2 [0, 1], and @ t u(t, x)+r· u(t, x)(1 u(t, x)) +r· u(t, x)v(t, x) = u(t, x), (3b) with u(0, x) = u0(x). 31

Slide 108

Slide 108 text

Controlling tra c flow dynamics There exists a scalar function , such that v(t, x) = r (t, x), and 8 < : @ t u + r · u(1 u) + r · ur = u, @ t + 1 2u, r + 1 2kr k 2 + uF(u) = . 32

Slide 109

Slide 109 text

The Primal-dual Algorithms The classical primal-dual hybrid gradient algorithms (PDHG) solves the following saddle point problem min z max p hKz, piL2 + g(z) h ⇤(p). When the operator K is nonlinear, we apply its extension with K(z) ⇡ K(¯ z) + rK(¯ z)(z ¯ z). Here, the extension of the PDHG scheme is as follows z n+1 = argmin z hz, [rK(z n)]T ¯ p n iL2 + g(z) + 1 2⌧ kz z n k 2 L2 , p n+1 = argmax p hK(z n+1), piL2 h ⇤(p) 1 2 kp p n k 2 L2 , ¯ p n+1 = 2p n+1 p n . Set z = (u, m), p = , K ((u, m)) = @tu + r · f(u) + r · m u, g ((u, m)) = Z 1 0 ✓Z ⌦ kmk 2 2u + 1[0,1] (u)dx F(u) ◆ dt, h(Kz) = ( 0 if Kz = 0 +1 else . 33

Slide 110

Slide 110 text

Solve the Control Problem with Primal-dual Algorithms We rewrite the varational problem as follows: inf u,m sup L(u, m, ), with u(0, x) = u0(x), (1, x) = u(1, x)H(u), L(u, m, ) = Z 1 0 ✓Z ⌦ kmk2 2u + 1[0,1] (u)dx F(u) ◆ dt + Z 1 0 Z ⌦ (@ t u + r · f(u) + r · m u) dxdt, and f(u) = u(1 u) is the tra c flux function. We denote the indicator function by 1A(x) = ( +1 x / 2 A 0 x 2 A . 34

Slide 111

Slide 111 text

Algorithm Algorithm :PDHG for the conservation law control system While k < Maximal number of iteration u(k+1), m(k+1) = argmin u L(u, m, ¯(k)) + 1 2⌧ k(u, m) (u(k), m(k))k2 L2 ; (k+1) = argmax L(u(k+1), m(k+1), ) 1 2 k (k) k2 H2 1 ; ¯(k+1) = 2 (k+1) (k); 35

Slide 112

Slide 112 text

Monotone schemes Definition For p, q 2 N, a scheme u k+1 j = G(u k j p 1, ..., u k j+q ) is called a monotone scheme if G is a monotonically nondecreasing function of each argument. Lax–Friedrichs Scheme. For discretization t, x in time and space, denote u k j = u(k t, j x), then the Lax–Friedrichs scheme is as follows u k+1 j = u k j t 2 x ⇣ f(u k j+1 ) f(u k j 1 ) ⌘ +( +c x) t ( x)2 ⇣ u k j+1 2u k j + u k j 1 ⌘ . To guarantee that the above scheme is monotone, we need: 1 2( + c x) t ( x)2 0, t 2 x |f 0(u)| + ( + c x) t ( x)2 0. As we want the scheme works when ! 0, the restriction on c and space–time stepsizes can be simplified as follows: c 1 2 |f 0(u)|, ( x)2 2( + c x) t. The first inequality suggests the artificial viscosity we need to add. The second one impose a strong restriction on the stepsize in time when > 0. 36

Slide 113

Slide 113 text

Discretization of the control problem We consider the control problem of scalar conservation law defined in [0, b] ⇥ [0, 1] with periodic boundary condition on the spatial domain. Given N x , N t > 0, we have x = b Nx , t = 1 Nt . For x i = i x, t l = l t, define u l i = u(tl, xi) 1  i  Nx, 0  l  Nt, m l 1,i = (mx1 (tl, xi))+ 1  i  Nx, 0  l  Nt 1, m l 2,i = (mx1 (tl, xi)) 1  i  Nx, 0  l  Nt 1, l i = (tl, xi) 1  i  Nx, 0  l  Nt, Nt i = u(1, xi) H(u Nt i ) 1  i  Nx, where u + := max(u, 0) and u = u + u. Note here m l 1,i 2 R+, m l 2,i 2 R . Denote (Du)i := ui+1 ui x [Du]i := ⇣ (Du)i , (Du)i 1 ⌘ d [Du]i = ⇣ (Du)+ i , (Du)i 1 ⌘ Lap(u)i = ui+1 2ui + ui 1 ( x)2 37

Slide 114

Slide 114 text

Discretization of the control problem The first conservation law equation adapted from the Lax–Friedrichs scheme satisfies ⇣ u l+1 i u l i ⌘ t + ⇣ f(u l+1 i+1 ) f(u l+1 i 1 ) ⌘ 2 x +(Dm)l 1,i 1 +(Dm)l 2,i = ( +c x)Lap(u)l+1 i . Following the discretization of the conservation law, the discrete saddle point problem has the following form: min u,m max L(u, m, ), where L(u, m, ) = x t X i,l (m l 1 1,i )2 + (m l 1 2,i )2 2u l i + 1[0,1] (u l i ) t X l F(u l) + x X i H(u Nt i ) + x t X i,l l i ⇣ u l+1 i u l i t + f(u l+1 i+1 ) f(u l+1 i 1 ) 2 x + (Dm)l 1,i 1 + (Dm)l 2,i ( + c x)Lap(u)l+1 i ⌘ . 38

Slide 115

Slide 115 text

Discretization of the control problem By taking the first order derivative of ul i , we automatically get the implicit finite di↵erence scheme for the dual equation of that is backward in time: 1 t ⇣ l+1 i l i ⌘ + ( l i+1 l i 1 ) 2 x f 0(u l i )+ 1 2 k[d D ]l ik 2+ F(u l i ) u = ( +c x)Lap( )l i. The discrete form of the Hamiltonian functional at t = t l takes the form HG(u, ) = X i 1 2 k[d D ]l ik 2 u l i + ( l i+1 l i 1 ) 2 x f(u l i ) + ( + c x)u l iLap( )l i ! F(u l). I When f(u) = 0, the above discretization reduces to the finite di↵erence scheme for the mean-field game system. I When F = 0, H = c for some constant c, the variational problem becomes classical conservation laws with initial data. No control will be enforced on the density function u. I Our algorithm provides an alternative way to solve the nonlinear conservation law with implicit discretization in time. 39

Slide 116

Slide 116 text

Example 1: Tra c flows We consider the tra c flow equation @tu + @xf(u) = 0, u(0, x) = ( 0.8, 1  x  2 0 else , where f(u) = 1 2 u(1 u), F = 0, H = 0, = 0, k = 0.5. Figure: Left: a comparison with the exact entropy solution at t = 1; middle: the numerical solution via solving the control problem; right: the numerical solution to the conservation law using Lax–Friedrichs scheme. 40

Slide 117

Slide 117 text

Example 2: Tra c flows transport problems We consider the tra c flow equation with f(u) = u(1 u), = 0.1, F = 0, c = 0.5. The final cost functional H(u(1, ·)) = µ Z ⌦ u(1, x) log( u(1, x) u1 )dx, µ = 1. We set u0 = 0.001 + 0.9e 10(x 2)2 , u1 = 0.001 + 0.45e 10(x 1)2 + 0.45e 10(x 3)2 . We also compare the result from the control of conservation law with a mean-field game problem, i.e., f = 0. 41

Slide 118

Slide 118 text

Example 2: Tra c flows transport problems 0 2 4 x 0 0.5 1 0 2 4 x 0 0.2 0.4 0.6 0 2 4 x 0 0.2 0.4 0.6 0 2 4 x 0 0.2 0.4 0.6 Figure: From left to right: initial configurations of u0 , u1 , solution u(1, x) for the control of conservation law, solution u(1, x) for the mean-field game problem. Figure: Left: solution u(t, x) for the problem of controlling the conservation law; right: solution u(t, x) for the mean-field game problem. 42

Slide 119

Slide 119 text

Example 3: Tra c control Consider the tra c flow equation with f(u) = u(1 u), = 10 3,F = ↵ R ⌦ u log(u)dx, ↵ 0. The final cost functional H(u(1, ·)) = R ⌦ u(1, x)g(x)dx. The initial density and final cost function are as follows u0(x) = ( 0.4 0.5  x  1.5 10 3 else , g(x) = 0.1 sin(2⇡x). Figure: Solution u(t, x) for the problem of controlling the conservation law. From left to right: ↵ = 0, 0.5, 1. 43

Slide 120

Slide 120 text

Example 3: Tra c control 0 2 4 x -0.2 0 0.2 0.4 u 0 g 0 2 4 x 0 0.5 1 =0 =0.5 =1 0 0.5 1 t 0 0.5 1 =0 =0.5 =1 Figure: Left: boundary conditions for the control problems u0 . Middle: solution u(1, x) for the problem of controlling the conservation law. Right: the numerical Hamiltonian HG(u, ). Figure: Solution u(t, x) for the problem of controlling the conservation law. From left to right: ↵ = 0, 0.5, 1. 44

Slide 121

Slide 121 text

Extending to systems of conservation law Figure: Control 1D compressible Navier-Stokes equations. Top: no control. Bottom: with control. 45