Slide 1

Slide 1 text

Mean Field Game MURI Report Stanley Osher (UCLA) August, 2023 1 AFOSR MURI FA9550-18-1-0502

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

MURI’s Product 3

Slide 4

Slide 4 text

Team strength • UCLA • Optimal transport (Analysis and Computation) • Control + Di ff erential games +Hamilton-Jacobi equation • Inverse problem+Optimization • AI + Scienti fi c computing • Fast Algorithms with applications, e.g. in COVID 19 pandemic and social network modeling 4

Slide 5

Slide 5 text

Team strength • Princeton and UH • 5G Communication • Communication Networking • UAV Networks • Machine Learning/Social Networks • Maryland • Evolutionary Game Theory • Cultural Modelling • Reinforcement Learning 5

Slide 6

Slide 6 text

Mean fi eld game 6

Slide 7

Slide 7 text

Mean fi eld system A typical time-dependent MFG system (derived from (9)) has the form 8 > > > > > < > > > > > : t 2 + H (x, Dx (x, t), ⇢(x, t), t) = 0 ⇢t 2 ⇢ divx (⇢DpH (x, Dx (x, t), ⇢(x, t), t)) = 0 ⇢ 0, R Td ⇢(x, t)dx = 1, t 2 [0, T] ⇢(x, T) = ⇢T (x), (x, 0) = 0(x, ⇢(x, 0)), x 2 Td . (10) Given: H : Td ⇥ Rd ⇥ X ⇥ R ! R is a periodic in space and convex in momentum Hamiltonian, where X = R+ or X = L 1(Td ; R+) or X = R+ ⇥ L 1(Td ; R+). H = L ⇤ is a di↵usion parameter 0 : Td ⇥ X ! R, ⇢T : Td ! R given initial-terminal conditions Unknowns: , ⇢ : Td ⇥ [0, T] ! R has the form 7

Slide 8

Slide 8 text

Challenges addressed ● Restrictive assumptions (violated in real world) in existing MFGs ● Discrete strategy set (needed for social-network applications) ● Lack of fast numerical solutions for high dimensional MFGs ● Lack of MFGs inverse problem (observations → model) ● How to make policy to achieve a goal for a large population 8

Slide 9

Slide 9 text

Proposed work: foundation and models ● Player strategies on a discrete/graph/manifold set ● Non-physical interactions, multiple scales, and multiple populations ● Social networks, social and culture norm dynamics ● Socio-economical dynamics in energy consumption ● Election modeling, and psycho-/socio-mathematics of rational and irrational agents ● Long-time population behaviors, non-equilibrium behavior, uncontrollable disasters (Covid 19) ● Engineering applications such as communications, power grids, networking and UAVs 9

Slide 10

Slide 10 text

Proposed work: Fast, efficient, and scalable numerics ● Address the curse of dimensionality ● Address nonconvexity and nonsmoothness ● Parallel and distributed algorithms ● Real-time for the forward problem ● Nearly real-time for policy making and MFG inverse problems ● Time implicit schemes for mean field PDEs and elsewhere 10

Slide 11

Slide 11 text

Selected products in 2022-2023 • PDHG+Time implicit schemes+Nonlinear PDEs: Generalized primal-dual hybrid gradient methods for time implicit schemes of reaction-di ff usion equations, conservation laws, Hamilton-Jacobi equations, etc; • PDHG+Optimization: Primal-dual damping algorithms for optimizations; • PDHG+HPC+OT+MFC: High performance computations for generalized optimal transport, mean fi eld control problems and time implicit updates of initial value mean fi eld PDEs. • Mean fi eld control problems with AI computations; 11

Slide 12

Slide 12 text

Computational results Stanley Osher and Wuchen Li et.al. UCLA and U of SC 12

Slide 13

Slide 13 text

A primal-dual approach for solving conservation laws with implicit in time approximations Siting Liu1 Wuchen Li2 Chi-Wang Shu3 Stanley Osher1 1University of California, Los Angeles, 2University of South Carolina 3Brown University Published in Journal of Computational Physics 472 (2023): 111654. 13

Slide 14

Slide 14 text

Introduction We consider the following initial value problem (IVP) of scalar conservation law defined over the domain ⌦ ⇥ [0, T]: ( @tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t)) = 0, u(x, 0) = u0(x). Computational methods I High order accurate schemes in the spatial domain: discontinuous Galerkin methods (DG methods), ENO, WENO, etc. I Explicit high order schemes in the time domain: the forward Euler method and Runge-Kutta method, etc. They have restrictions on the length of the time step from Courant-Friedrichs-Lewy (CFL) conditions. I Implicit methods are suitable for sti↵ problems and can be used with larger time steps. The solvers typically require: extra computation and storage of the Jacobian matrix; a good initial guess etc. 2 14

Slide 15

Slide 15 text

A new framework for solving conservation laws We connect the computational methods with primal-dual optimization approach. I Cast IVP as a saddle point problem (a min-max problem). – Treat the continuity equation as a constraint. – Get a system of dual equations. I Apply implicit in time scheme for the min-max problem. – Accommodate many high order scheme in time and space. – Achieve implicit schemes for system of dual equations. I Use primal-dual method to find the saddle point. – First order optimization approach, no computation or storage of Jacobian matrix, no requirement on initial guess. – Optimization iterations independent of meshsize. – Geometric convergence for linear equations. No theoretic guarantee for other cases, but convergence in practice. – Highly parallelizable and easy-to-implement. No nonlinear inversions are required! 3 15

Slide 16

Slide 16 text

A saddle point formulation of the IVP We recast the IVP as the following optimization problem: min u 1u2U . U = n u : u(x, 0) = u0(x), @tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t)) = 0 for all (x, t) 2 ⌦ ⇥ [0, T] o . Here the indicator function defined as follows 1a2A = ( 0 if a 2 A 1 if a / 2 A. By introducing a Lagrange multiplier : ⌦ ⇥ [0, T] ! R, we can remove the constraint, which leads to a min-max problem: min u,u(·,0)=u0 max L(u, ), (1) L(u, ) = Z T 0 Z ⌦ (x, t) (@tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t))) dxdt. 4 16

Slide 17

Slide 17 text

The system of dual equations at saddle point We consider a variable coe cient type heat equation, then L(u, ) = Z T 0 Z ⌦ (ut @x ( (x)@xu)) dxdt = Z T 0 Z ⌦ u(x, t) (@t (x, t) + @x ( (x)@x )) dxdt + Z ⌦ u(x, T) (x, T) u(x, 0) (x, 0)dx. By taking the first-order optimality condition of L(u, ), we obtain the following system of equations: u ( ) runs forward (backward) in time: 8 > > > < > > > : ut @x ( (x)@xu (x, t)) = 0, u(x, 0) = u0(x), t + @x ( (x)@x (x, t)) = 0, (x, T) = 0. 5 17

Slide 18

Slide 18 text

Finite di↵erence scheme for variable coe cient type heat equation The discretized min-max problem (1) takes the following formulation: min u2{ul j } max 2{ l j } L(u, ), where L(u, ) = hthx X 0lNt 1 1jNx l j ul+1 j ul j ht Lap(ul+1)j ! . The discretized saddle point of the min-max problem is: 8 > > > > > > > < > > > > > > > : ul+1 j ul j ht Lap(ul+1)j = 0, 0  l  Nt 1, 1  j  Nx, u0 j = u0(xj), 1  j  Nx, l j l 1 j ht + Lap( l 1)j = 0, 1  l  Nt, 1  j  Nx, Nt j = 0, 1  j  Nx. Both equations are discretized with implicit backward Euler method. 6 18

Slide 19

Slide 19 text

A primal-dual algorithm We adapt the primal-dual hybird gradient (PDHG) algorithm for convex optimization problem, and integrate General-prox PDHG and nonlinear PDHG. The n-th iteration is taken as follows: 8 > > > < > > > : un = argmin hu, AT ˜n 1i + 1 2⌧u ku un 1k2 L2 , n = argmax hAun, i 1 2⌧ k n 1k2 H , ˜n = 2 n n 1. Here A denotes the discrete approximation of the di↵erential operator: Au ⇡ @tu @x ( (x)@xu) , AT ⇡ t + @x ( (x)@x ) . Each update can be written explicitly, un = un 1 ⌧u ⇣ (@t + ↵@x) ˜n 1 + @x ⇣ @x ˜n 1 ⌘⌘ , n = n 1 ⌧ @tt + ↵2@xx ˆ2@xxxx 1 ((@t + ↵@x) un @x( @xun)) . The computation for n can be done by using Fast Fourier Transform (FFT). 7 19

Slide 20

Slide 20 text

Example 1: variable coe cient type heat equation ( @tu = @x ( @xu) , (x, t) 2 [0, 1] ⇥ [0, 0.1], u(x, 0) = exp ( 64(x 0.5)2), for x 2 [0, 1], We record number of iterations needed to achieve certain errors ✏ for the primal-dual approach for di↵erent mesh in the following table: Nt ⇥ N 64 ⇥ 16 128 ⇥ 32 256 ⇥ 64 512 ⇥ 128 1028 ⇥ 256 ✏ = 10 6 90 91 92 94 102 ✏ = 10 10 144 146 147 149 163 I For an explicit scheme, we need to satisfy a restrictive CFL condition, i.e., ht (hx)2 < 1 2 maxx (x) . However, in the above examples we have ht (hx)2 = 128 5 , 256 5 , ..., 2048 5 . I We can see from above that the convergence of the algorithm for this linear equation is independent of the grid size. 8 20

Slide 21

Slide 21 text

Example 2: quadratic conservation law Consider a quadratic conservation law of the following form: @tu + @x(↵u2 + u) = 0, for (x, t) 2 [0, 2] ⇥ [0, 1], u(x, 0) = ( 0.1, if 1  x  2, 0.25 else. For the numerical scheme, we use backward Euler and discontinuous Galerkin method. Figure: Here ↵ = 1, = 1, we have the tra c equation. Left: numerical solution of the tra c equation with discontinuous initial data, N ⇥ Nt = 256 ⇥ 32. Right: numerical solution at time t = 0, 0.5, 1. The shock propagation (left) and development of rarefaction wave (right). 9 21

Slide 22

Slide 22 text

Example 3: Barotropic Compressible Navier–Stokes equation (BNS) Consider a BNS equation of the following form: 8 > < > : @t⇢ + @xm = 0, @tm + @x( m2 ⇢ ) + @xP(⇢) = @x(µ(⇢)@x m ⇢ ), Here, ⇢: density; m := ⇢v: flx; v: velocity. We set pressure P(⇢) = c1⇢2, µ(⇢) = c2 . For the numerical scheme, we use backward Euler and Lax–Friedrichs type of scheme. Figure: From left to right: ⇢, m, v. With positive uniform initial velocity start at t = 0, the density moves towards right. 10 22

Slide 23

Slide 23 text

Work in progress: min-max with entropy regularizer Inspired by the Brenier’s work12, we incorporate the entropy as a convex regularizer term to our min-max formulation of IVP, and apply PDHG to solve the new saddle point problem. min u,u(·,0)=u0 max {L(u, ) + E(u)} I Burger’s equation: E(u) = RR u2dxdt I Euler equation of isothermal fluids:E(⇢, m) = RR ⇣ m2 2⇢ + ⇢ log ⇢ ⌘ dxdt Figure: The residuals converge significantly faster with entropy regularizers. 1Variational Methods for Evolution, Workshop Report 2017,54 2initial value problem for the Euler equations of incompressible fluids viewed as a concave maximization problem.Communications in Mathematical Physics 364 (2018): 579-605. 11 23

Slide 24

Slide 24 text

A first-order computational algorithm for reaction-di↵usion type equations via primal-dual hybrid gradient method Shu Liu, Siting Liu, Stanley Osher, Wuchen Li May, 2023 24

Slide 25

Slide 25 text

Introduction I Goal We propose an easy-to-implement iterative method for resolving the implicit (or semi-implicit) schemes arising in solving reaction-di↵usion (RD) type equations (or system). I Our treatment – Formulate as a min-max saddle point problem; – Apply the primal-dual hybrid gradient (PDHG) method. – Suitable precondition matrices can be applied to accelerate the convergence of algorithms. I The method is applicable to various discrete numerical schemes with high flexibility. I The method can e ciently produce numerical solutions with su cient accuracy. 2 25

Slide 26

Slide 26 text

Formulations I Reaction-di↵usion (RD) type equation @u(x, t) @t = Lu(x, t) + Ru(x, t) on ⌦ ⇢ Rd, (1) with initial condition u(·, 0) = u0, L is a certain elliptic operator such as a harmonic or biharmonic operator depicting the di↵usion process; R is a certain nonlinear operator depicting the reaction process. I The semi-discrete implicit scheme at the t-th time step is ut+1 ut ht = Lut+1 + Rut+1. (2) Given ut, we want to solve for ut+1. 3 26

Slide 27

Slide 27 text

Methods Suppose L, R is the space discretization of L, R. The previous scheme reduces to the following nonlinear equation F(U) = 0, where F(U) = U Ut ht(LU + R(U)) = 0. U 2 RN , F : RN ! RN , where N is the number of discretization of the space. Idea: I Solving F(U) = 0 is equivalent to minimizing ◆(F(U)), where ◆(x) = 0 i↵ x = 0 and ◆(x) = +1 i↵ x 6= 0. I Notice that ◆(x) = supp2RN {p>x}. I Thus, min {◆(F(U))} is equivalent to solving the saddle point problem min U2RN max P 2RN {P>F(U)}. 4 27

Slide 28

Slide 28 text

Methods We apply the following dynamic for solving the previous saddle point problem in order to update the numerical solution Ut at time t to Ut+1. Such a dynamic is inspired by the G prox Primal-Dual Hybrid Gradient (PDHG) method. Initial condition: U0 = Ut, P0 = 0; Pn+1 = Pn + ⌧pG 1F(Un); e Pn+1 = 2Pn+1 Pn; Un+1 = Un ⌧u rU F(Un) > e Pn+1. I ⌧p, ⌧u are stepsizes of our method. They usually take values in [0.5, 1]; I The N ⇥ N matrix G plays an important role here. It can be treated as the preconditioner that may dramatically improve the convergence speed of the method. In our experience, G is usually chosen as (I htL)(I htL)> . The inversion of G can be handled with FFT or DCT e ciently. The computational complexity of evolving this dynamic at each iteration is O(N log N). 5 28

Slide 29

Slide 29 text

Example I (Cahn-Hilliard equation with seven circles) I We consider the Cahn-Hilliard equation on a periodic region ⌦ = [0, 2⇡)2 that takes the following form. @u(x, t) @t = a u(x, t) + b W0 (u(x, t)), u(·, 0) = u0. Here we assume a = 0.12, b = 1, and the double-well potential function W(u) = 1 2 (u2 1)2. u0 is chosen as the seven-circle function shown in the figures on the next page. I We will solve the equation on the time interval [0, 30]. In our numerical implementation, we set space discretization as 128 ⇥ 128; time discretization number as 6000 with ht = 1/200. For the PDHG iteration, we set ⌧u = ⌧p = 0.5. 6 29

Slide 30

Slide 30 text

Example I (Cahn-Hilliard equation with seven circles) (a) t = 0.0 (b) t = 5.0 (c) t = 15.0 (d) t = 25.0 (e) t = 30.0 Figure: Numerical solution and log10 Res(Un) plot at di↵erent time stages for the seven circle example. The residual plots indicate the linear convergence of our method for the nonlinear objective functions used in this example. Here Res(Un) = kUn Ut ht (LUn + R(Un))k2 . 7 30

Slide 31

Slide 31 text

Example II (A higher order equation ) I We test the method on the following 6th-order Cahn-Hilliard-type equation which originates from the research in functionalized polymer. @u(x, t) @t = (✏2 W00 (u)+✏2 )(✏2 u W0 (u)) On ⌦, u(·, 0) = u0. I We set ⌦ = [0, 2⇡)2, ✏ = 0.18. W(u) is the same as before. I We choose the initial condition u0(x, y) = 2esin x+sin y 2 + 2.2e sin x sin y 2 1. I In the numerical implementation, we solve the equation on the time interval [0, 20]. We choose space discretization as 128 ⇥ 128, and time discretization number as 20000 with ht = 1/1000. I For the PDHG iteration, we set stepsizes ⌧u = ⌧p = 0.58. 8 31

Slide 32

Slide 32 text

Example II (A higher order equation) (a) t = 0.0 (b) t = 0.1 (c) t = 2.0 (d) t = 20.0 Figure: Numerical solution at di↵erent time stages (a) Residual decay t = 2.0 (b) Residual decay t = 20.0 (c) Residual plot t = 2.0 (d) Residual plot at t = 20.0 Figure: Decay of log10 residue term & plots of residual functional Res(U n) at di↵erent time stages . Here Res(Un) = kUn Ut ht (LUn + R(Un))k2. 9 32

Slide 33

Slide 33 text

Primal-Dual hybrid gradient algorithms for computing time implicit Hamilton-Jacobi equations Tingwei Meng+, Wenbo Hao+, Siting Liu+, Liu Yang+, Wuchen Liú, Stanley Osher+ úDepartment of Mathematics, University of South Carolina +Department of Mathematics, UCLA Aug 10, 2023 Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 1 / 8 33

Slide 34

Slide 34 text

Motivation Solving the following Hamilton-Jacobi (HJ) PDE Y ] [ ˆ„(x, t) ˆt + H(x, t, Ò x „(x, t)) = ‘ x „(x, t), x œ , t > 0, „(x, 0) = g(x), x œ , (1) Grid based method (e.g., Lax–Friedrichs, Engquist-Osher scheme) needs to satisfy the CFL condition in discretization. In this project, we provide an optimization method for solving HJ PDEs (with certain form of H) with implicit time discretization. This allows us to choose larger time stepsize and avoid CFL condition. In the literature, there are some optimization algorithms for solving HJ PDE [Darbon, Dower, Osher, Yegerov, . . . ]. Compared to these methods, the optimization algorithm proposed in this project can handle a more general Hamiltonian that depends on (x, t). Partial theoretical guarantee is provided. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 2 / 8 34

Slide 35

Slide 35 text

Continuous formula Solving the original HJ PDE (1) is equivalent to the following optimization problem min „ satisfying (1) ≠c ⁄ „(x, T)dx, (2) where c œ R is a constant used to stabilize the method. Introducing the Lagrange multiplier fl, we get min „:„(·,0)=g max fl ⁄ T 0 ⁄ fl(x, t) 1 ˆ„(x, t) ˆt + H(x, t, Ò x „(x, t)) ≠ ‘ x „(x, t) 2 dxdt ≠ c ⁄ „(x, T)dx. (3) Under certain condition (e.g., H(x, t, ·) is convex for any x, t), this is equivalent to the saddle point problem min „:„(·,0)=g max fl,v:flØ0 ⁄ T 0 ⁄ fl(x, t) 3 ˆ„(x, t) ˆt + Èv(x, t), Ò x „(x, t)Í ≠ H ú(x, t, v(x, t)) ≠‘ x „(x, t) 4 dxdt ≠ c ⁄ „(x, T)dx, (4) where H ú(x, t, ·) is the Fenchel-Legendre transform of H(x, t, ·) for any x, t. We discretize and then employ the PDHG method to solve this saddle point problem. We use implicit time discretization to avoid the CFL condition. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 3 / 8 35

Slide 36

Slide 36 text

Continuous formula First order optimality condition Y _ _ _ _ _ ] _ _ _ _ _ [ ˆ„(x, t) ˆt + Èv(x, t), Ò x „(x, t)Í ≠ H ú(x, t, v(x, t)) = ‘ x „(x, t), x œ , t > 0, ˆfl(x, t) ˆt + Ò x · (fl(x, t)v(x, t)) + ‘ x fl(x, t) = 0, x œ , t > 0, v(x, t) = Ò pH(x, t, Ò x „(x, t)), x œ , t > 0, „(x, 0) = g(x), fl(x, T) = c, x œ , (5) In PDHG method, the updates for „ and fl are related to the first (linear wrt „) and second line (linear wrt fl), respectively. We use implicit time discretization We can have a larger CFL number ( t x max p ÎÒ pH(x, t, p)Î), e.g., in example 2, CFL number is 0.036 for explicit scheme, but it can be about 10.8 in our method. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 4 / 8 36

Slide 37

Slide 37 text

Numerical results: example 1 Solving Burgers equation for = [0, 2]n (n = 1 or 2) with periodic spatial condition and initial condition „(x, 0) = 1 2 Îx ≠ 1Î2. We see a first order error decay in this example. nx ◊ nt 20 ◊ 11 40 ◊ 21 80 ◊ 41 160 ◊ 81 Averaged absolute residual of HJ PDE 9.36e-07 9.75e-07 9.86e-07 9.95e-07 ¸1 relative error 5.28e-02 2.69e-02 1.13e-02 2.71e-03 Table: Error table: averaged absolute residual of discretized equation, and ¸1 relative error nx ◊ ny ◊ nt 20 ◊ 20 ◊ 11 40 ◊ 40 ◊ 21 80 ◊ 80 ◊ 41 160 ◊ 160 ◊ 81 Averaged absolute residual of HJ PDE 9.65e-07 9.84e-07 9.99E-07 1.00E-06 ¸1 relative error 5.21e-02 2.68e-02 1.12E-02 2.71E-03 Table: Error table: averaged absolute residual of discretized equation, and ¸1 relative error Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 5 / 8 37

Slide 38

Slide 38 text

Numerical results: example 2 We solve the following viscous HJ PDE whose Hamiltonian depends on both the spatial variable x and the time variable t (for n = 1, 2): Y _ ] _ [ ˆ„(x, t) ˆt + 1 2 ÎÒ x „(x, t)Î2 ≠ f (x, t) = 0.1 x „(x, t), x œ [0, 2]n, t > 0, „(x, 0) = ≠ 1 10 Îx ≠ 1Î2, x œ [0, 2]n, (6) where f (x, t) = 1 2 min{(x1 ≠ t ≠ 0.5)2, (x1 ≠ t + 1.5)2, (x1 ≠ t ≠ 2.5)2} + q j>1 1 4 (xj ≠ 1)2. One-dimensional result (nt = 5, nx = 80, we show several level sets of the solution): Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 6 / 8 38

Slide 39

Slide 39 text

Numerical results: example 2 Two-dimensional result (nt = 5, nx = ny = 80, we show several level sets of the solution „(·, t) for certain t): (a) t = 0.25 (b) t = 0.5 (c) t = 0.75 (d) t = 1.0 Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 7 / 8 39

Slide 40

Slide 40 text

Conclusion In this project, we propose an optimization algorithm for solving HJ PDE whose Hamiltonian H could depend on (x, t), but we require H(x, t, ·) to be convex for any (x, t) We apply the implicit time discretization. As a result, we can choose a large time step and avoid the CFL condition Partial theoretical guarantees are provided It may be combined with higher order scheme, e.g., used as an initialization for more accurate scheme This idea may also be generalized to solving stationary HJ PDEs Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 8 / 8 40

Slide 41

Slide 41 text

Primal-Dual Damping for optimization Xinzhe Zuo, Stanley Osher, Wuchen Li June 1, 2023 41

Slide 42

Slide 42 text

Introduction In convex optimization, we seek to find arg min x2Rd f(x) . If the function is convex and di↵erentiable, this is equivalent to finding x⇤ 2 Rd such that rf(x⇤) = 0 . This can be rewritten in a variational formulation: inf x2Rd sup p2Rd hp, rf(x)i , (0.1) which becomes a saddle point problem. 2 42

Slide 43

Slide 43 text

Formulations We add a regularization term in (0.1): inf x2Rd sup p2Rd hrf(x), pi " 2 kpk2 , (0.2) where " > 0 is a constant. This regularization term further drives p to 0. Proposition Let f : Rd ! R be a C2 strongly convex function. Then the saddle point to (0.2) is the unique global minimum of f and p⇤ = 0. 3 43

Slide 44

Slide 44 text

Algorithm Algorithm Linearized Primal-Dual Damping Algorithm Initial guesses x0 2 Rd, p0 2 Rd; Stepsizes ⌧ > 0, > 0; Parameters A > 0, " > 0, ! > 0, C 0. while n = 1, 2, · · · , not converge do pn+1 = 1 1+ "A pn + A 1+ "A rf(xn); ˜ pn+1 = pn+1 + !(pn+1 pn); xn+1 = xn ⌧C(xn)˜ pn+1; end Here C(x) can usually be chosen to have a simple structure. For example, C(x) can be the identity matrix or a diagonal matrix. This makes our algorithm a first-order algorithm. 4 44

Slide 45

Slide 45 text

Convergence Theorem (Informal) Let f : Rd ! R be a C4 strongly convex function. Let x⇤ be the global minimum of f and p⇤ = 0. Then, the iteration (xn, pn) produced by Algorithm (1) converges to the saddle point (x⇤, p⇤) if ⌧, , are small enough. Moreover, kpnk2 + krf(xn)k2  (kp0k2 + krf(x0)k2)(1 µ2 M + )n, where µ = minx min(r2f(x)C(x)), C(x) = B(x)r2f(x), > 0 depends on the initial condition, and M > 0 depends on C(x)T r3f(x)rf(x) + (r2f(x))2 C(x), ⌧, , A, ", and !. Remark Our algorithm is a ne in rf(x), yet our theorem indicates that krf(x)k2 converges to 0. 5 45

Slide 46

Slide 46 text

Strongly convex functions Regularized log-sum-exp function: f(x) = log n X i=1 exp(qT i x) ! + 1 2 xT Qx , where n = 100, Q = QT 0 and qT i is the ith row of Q. Q is chosen to be diagonally dominant, i.e. Qi,i > P j6=i |Qi,j |. C(x) in our algorithm can be chosen as either the identity matrix or a diagonal matrix with Q 1 i,i on the diagonal. Quadratic minus cosine: f(x) = kxk2 cos(cT x) , where c is a vector in R100 with kck2 = 1.9. 6 46

Slide 47

Slide 47 text

Strongly convex functions (a) Regularized log-sum-exp (b) Qudratic minus cosine Figure: Comparison of gradient descent, NAG, PDD, and IGAHD-SC (we use ‘att’ as a shorthand for this method) on minimizing (a) the regularized log-sum-exp function and (b) the quadratic minus cosine function. The y-axis represents the 2-norm of the gradient of the objective function on a logarithmic scale. The x-axis represents the number of iterations on a logarithmic scale. 7 47

Slide 48

Slide 48 text

Non-convex functions Rosenbrock function: f(x, y) = (a x)2 + b(y x2)2 , where a common choice of parameters is a = 1, b = 100. Ackley function: f(x, y) = 20 exp 0.2 p 0.5(x2 + y2) exp ⇥ 0.5 cos(2⇡x) + cos(2⇡y) ⇤ + e + 20 , 8 48

Slide 49

Slide 49 text

Rosenbrock (a) Convergence comparison (b) Optimization trajectories Figure: Minimizing the Rosenbrock function with gradient descent, NAG, PDD with C(x) = I and IGAHD (‘att’). The left panel shows the convergence speed of each method. The right panel shows the optimization trajectories of each method. 9 49

Slide 50

Slide 50 text

Ackley (a) Convergence comparison (b) Optimization trajectories Figure: Minimizing the Ackley function with gradient descent, NAG, PDD and IGAHD (‘att’). The left panel shows the convergence speed of each method. The right panel shows the optimization trajectories of each method. 10 50

Slide 51

Slide 51 text

MNIST with two-layer NN Figure: Training a two-layer neural network with the MNIST data set using gradient descent, NAG, PDD, Adam, and IGAHD (‘att’). The left panel shows the convergence speed of training loss. The right panel shows the test accuracy of each method. The x-axis represents the number of iterations in terms of mini-batches. 11 51

Slide 52

Slide 52 text

CIFAR 10 with CNN Figure: Training loss (left panel) and test accuracy (right panel) of a convolutional neural network on the CIFAR10 data set. The x-axis represents the number of iterations in terms of mini-batches. 12 52

Slide 53

Slide 53 text

High-order computation of generalized optimal transport, mean field control problems, and variational time implicit schemes Guosheng Fu, Will Pazner, Siting Liu, Stanley Osher, Wuchen Li arXiv:2302.02308, arXiv:2303.08950, arXiv:2306.06287 February/March/June 2023 53

Slide 54

Slide 54 text

Introduction: Entropy dissipation I Reaction-di↵usion system with M species and R reactions: @t⇢i = r· ✓ V1,i(⇢i)r ⇢ Ei(⇢i) ◆ R X p=1 V2,p(⇢) i,p M X j=1 j,p ⇢ Ej(⇢j), where Ei is a given energy functional. I Energy dissipation law: d dt M X i=1 Ei(⇢i) = I(⇢)  0, with the generalized Fisher information functional I(⇢) := Z ⌦ h M X i=1 kr Ei ⇢ k2V1,i(⇢i) + R X p=1 M X j=1 j,p Ej ⇢ 2 V2,p(⇢) i dx. 2 54

Slide 55

Slide 55 text

Generalized optimal transport distances DistV1,V2 (⇢ 0, ⇢ 1)2 = inf ⇢,m,s n Z T 0 Z ⌦ M X i=1 |mi |2 V1,i(⇢i) + R X p=1 |sp |2 V2,p(⇢) ! dxdt : @t⇢i + r · mi = PR p=1 i,psp, 81  i  M, mi · ⌫|@⌦ = 0, ⇢(0, ·) = ⇢0, ⇢(T, ·) = ⇢1. o , where ⇢ = (⇢1, · · · , ⇢M ) is the collection of densities, m = (m1, · · · , mM ) is the collection of fluxes, and s = (s1, · · · , sR) is the collection of sources. 3 55

Slide 56

Slide 56 text

Mean field control problems and JKO-type schemes I The generalized distance introduces a potential mean-field control problem: inf ⇢,m,s n Z T 0 Z ⌦ M X i=1 |mi |2 V1,i(⇢i) + R X p=1 |sp |2 V2,p(⇢) ! dxdt Z T 0 F(⇢)dt + G(⇢(T, ·)), : @t⇢i + r · mi = PR p=1 i,psp, 81  i  M, mi · ⌫|@⌦ = 0, ⇢(0, ·) = ⇢0, ⇢(T, ·) = ⇢1. o , for mobility functions V1 , V2 , initial density ⇢0, potential functional F and terminal energy G. I Special choices of mean field control problems form a variational time implicit scheme (JKO-type). E.g., F(⇢) = 0, G(⇢) = M X i=1 E(⇢i). 4 56

Slide 57

Slide 57 text

Saddle problems for generalized mean field control Saddle-point reformulation of the variational MFC problem: Find the critical point of inf u,⇢T sup Z T 0 Z ⌦ h H(u) u · D( ) i dxdt + Z ⌦ h G(⇢T ) + ⇢T · (T, ·) ⇢ 0 · (0, ·) i dx. Here u := (⇢, m, s, n), ⇢T := (⇢1,T , · · · , ⇢M,T ), := ( 1, · · · , M , 1, · · · , M ), initial data ⇢0 := (⇢0 1 , · · · , ⇢0 M ), the nonlinear function H(u) := M X i=1 ✓ kmi k2 2V1,i(⇢i) + |ni |2 2V3,i(⇢i) ◆ + R X p=1 |sp |2 2V2,p(⇢) F (⇢), and the operator D( ) := (@t + r · , r , T , ). 5 57

Slide 58

Slide 58 text

Methods I Space-time high-order finite elements: Find the critical point of the fully discrete system inf u h 2W k 1 h , ⇢T,h 2Mk 1 h sup h 2V k h hhH(uh ) uh · D( h)iih + hG(⇢T,h)ih + h⇢T,h · h(T, ·) ⇢ 0 · h(0, ·)ih. I G-prox primal-dual hybrid gradient algorithm: (1) -update: `+1 h = argmin h 1 2 hh|D( h ` h )|2iih + hhu ` h · D( h)iih h⇢ ` T,h · h(T, ·) ⇢ 0 · h(0, ·)ih. (2) Extrapolation: ˜ `+1 h = 2 `+1 h ` h . (3) u, ⇢-update: u `+1 h = argmin uh 1 2 u hh|uh u ` h |2iih + hhH(uh ) uh · D( ˜ `+1 h )iih , ⇢ `+1 T,h = argmin ⇢T,h 1 2 ⇢ h|⇢T,h ⇢ ` T,h )|2ih + hG(⇢T,h ) + ⇢T,h · ˜`+1 h (T, ·)ih . 6 58

Slide 59

Slide 59 text

Properties Advantages of high-order methods over low-order methods: I E ciency: more e cient per degree of freedom. I Accuracy: more accuracy per degree of freedom. G-prox PDHG Algorithm is highly scalable: MPI+GPU. I -update is a global constant-coe cient-type (space-time) high-order di↵usion solve. High-performance, scalable solvers are available. E.g., multigrid with PCG achieves optimal computational complexity. I u, ⇢-update is a pointwise nonlinear single-value optimization problem. Embarrassingly parallelizable and optimal computational complexity. 7 59

Slide 60

Slide 60 text

Example I: Initial value problem The PDE: @t⇢ 1@x0x0 ⇢ 2@x1x1 ⇢ = µ⇢(1 ⇢). Di↵usion parameters 1 = 0.1, 2 = 0.01, and µ > 0 is the reaction coe cient to be specified. The initial condition is a flat-top Gaussian: ⇢0(x0, x1) = ( 1, if x2 0 + 4x2 1  0.25 exp( 10(x2 0 + 4x2 1 0.25)), otherwise The computational domain is a rectangle ⌦ = [ 2, 2] ⇥ [ 1, 1], which is discretized with a 32 ⇥ 16 square mesh. We use polynomial degree k = 4 We take time step size t = 0.1, and the final time is T = 4. 8 60

Slide 61

Slide 61 text

61

Slide 62

Slide 62 text

Example II: Mean field control 9 62

Slide 63

Slide 63 text

Example III: 3D mean field control 10 63

Slide 64

Slide 64 text

Exact score-based generative models (SGM) via regularized Wasserstein proximal (ongoing work) I Joint with B. Zhang, S. Liu, M. Katsoulakis, S. Osher, W. Li. I Formulate SGM models via regularized Wasserstein proximal operators with an explicit kernel formula. I Exact score functions calculated via kernel formulas. References • W. Li, S. Liu, S. Osher. A kernel formula for regularized Wasserstein proximal operators. 2023. • B. Zhang and M.A. Katsoulakis. A mean-field games laboratory for generative modeling. 2023. 1 64

Slide 65

Slide 65 text

Exact score-based generative models (SGM) via regularized Wasserstein proximal Figure: Experiments were conducted on dataset of 2D toy models. We began with 200 initial data points, applying a forward di↵usive process. Subsequently, a reverse di↵usive process was performed using the score function directly computed via the kernel formula to generate new samples. 2 65