Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mean field games with applications year 5

Wuchen Li
August 20, 2023

Mean field games with applications year 5

Wuchen Li

August 20, 2023
Tweet

More Decks by Wuchen Li

Other Decks in Research

Transcript

  1. 2

  2. Team strength • UCLA • Optimal transport (Analysis and Computation)

    • Control + Di ff erential games +Hamilton-Jacobi equation • Inverse problem+Optimization • AI + Scienti fi c computing • Fast Algorithms with applications, e.g. in COVID 19 pandemic and social network modeling 4
  3. Team strength • Princeton and UH • 5G Communication •

    Communication Networking • UAV Networks • Machine Learning/Social Networks • Maryland • Evolutionary Game Theory • Cultural Modelling • Reinforcement Learning 5
  4. Mean fi eld system A typical time-dependent MFG system (derived

    from (9)) has the form 8 > > > > > < > > > > > : t 2 + H (x, Dx (x, t), ⇢(x, t), t) = 0 ⇢t 2 ⇢ divx (⇢DpH (x, Dx (x, t), ⇢(x, t), t)) = 0 ⇢ 0, R Td ⇢(x, t)dx = 1, t 2 [0, T] ⇢(x, T) = ⇢T (x), (x, 0) = 0(x, ⇢(x, 0)), x 2 Td . (10) Given: H : Td ⇥ Rd ⇥ X ⇥ R ! R is a periodic in space and convex in momentum Hamiltonian, where X = R+ or X = L 1(Td ; R+) or X = R+ ⇥ L 1(Td ; R+). H = L ⇤ is a di↵usion parameter 0 : Td ⇥ X ! R, ⇢T : Td ! R given initial-terminal conditions Unknowns: , ⇢ : Td ⇥ [0, T] ! R has the form 7
  5. Challenges addressed • Restrictive assumptions (violated in real world) in

    existing MFGs • Discrete strategy set (needed for social-network applications) • Lack of fast numerical solutions for high dimensional MFGs • Lack of MFGs inverse problem (observations → model) • How to make policy to achieve a goal for a large population 8
  6. Proposed work: foundation and models • Player strategies on a

    discrete/graph/manifold set • Non-physical interactions, multiple scales, and multiple populations • Social networks, social and culture norm dynamics • Socio-economical dynamics in energy consumption • Election modeling, and psycho-/socio-mathematics of rational and irrational agents • Long-time population behaviors, non-equilibrium behavior, uncontrollable disasters (Covid 19) • Engineering applications such as communications, power grids, networking and UAVs 9
  7. Proposed work: Fast, efficient, and scalable numerics • Address the

    curse of dimensionality • Address nonconvexity and nonsmoothness • Parallel and distributed algorithms • Real-time for the forward problem • Nearly real-time for policy making and MFG inverse problems • Time implicit schemes for mean field PDEs and elsewhere 10
  8. Selected products in 2022-2023 • PDHG+Time implicit schemes+Nonlinear PDEs: Generalized

    primal-dual hybrid gradient methods for time implicit schemes of reaction-di ff usion equations, conservation laws, Hamilton-Jacobi equations, etc; • PDHG+Optimization: Primal-dual damping algorithms for optimizations; • PDHG+HPC+OT+MFC: High performance computations for generalized optimal transport, mean fi eld control problems and time implicit updates of initial value mean fi eld PDEs. • Mean fi eld control problems with AI computations; 11
  9. A primal-dual approach for solving conservation laws with implicit in

    time approximations Siting Liu1 Wuchen Li2 Chi-Wang Shu3 Stanley Osher1 1University of California, Los Angeles, 2University of South Carolina 3Brown University Published in Journal of Computational Physics 472 (2023): 111654. 13
  10. Introduction We consider the following initial value problem (IVP) of

    scalar conservation law defined over the domain ⌦ ⇥ [0, T]: ( @tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t)) = 0, u(x, 0) = u0(x). Computational methods I High order accurate schemes in the spatial domain: discontinuous Galerkin methods (DG methods), ENO, WENO, etc. I Explicit high order schemes in the time domain: the forward Euler method and Runge-Kutta method, etc. They have restrictions on the length of the time step from Courant-Friedrichs-Lewy (CFL) conditions. I Implicit methods are suitable for sti↵ problems and can be used with larger time steps. The solvers typically require: extra computation and storage of the Jacobian matrix; a good initial guess etc. 2 14
  11. A new framework for solving conservation laws We connect the

    computational methods with primal-dual optimization approach. I Cast IVP as a saddle point problem (a min-max problem). – Treat the continuity equation as a constraint. – Get a system of dual equations. I Apply implicit in time scheme for the min-max problem. – Accommodate many high order scheme in time and space. – Achieve implicit schemes for system of dual equations. I Use primal-dual method to find the saddle point. – First order optimization approach, no computation or storage of Jacobian matrix, no requirement on initial guess. – Optimization iterations independent of meshsize. – Geometric convergence for linear equations. No theoretic guarantee for other cases, but convergence in practice. – Highly parallelizable and easy-to-implement. No nonlinear inversions are required! 3 15
  12. A saddle point formulation of the IVP We recast the

    IVP as the following optimization problem: min u 1u2U . U = n u : u(x, 0) = u0(x), @tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t)) = 0 for all (x, t) 2 ⌦ ⇥ [0, T] o . Here the indicator function defined as follows 1a2A = ( 0 if a 2 A 1 if a / 2 A. By introducing a Lagrange multiplier : ⌦ ⇥ [0, T] ! R, we can remove the constraint, which leads to a min-max problem: min u,u(·,0)=u0 max L(u, ), (1) L(u, ) = Z T 0 Z ⌦ (x, t) (@tu(x, t) + @xf(u(x, t)) @x ( (x)@xu (x, t))) dxdt. 4 16
  13. The system of dual equations at saddle point We consider

    a variable coe cient type heat equation, then L(u, ) = Z T 0 Z ⌦ (ut @x ( (x)@xu)) dxdt = Z T 0 Z ⌦ u(x, t) (@t (x, t) + @x ( (x)@x )) dxdt + Z ⌦ u(x, T) (x, T) u(x, 0) (x, 0)dx. By taking the first-order optimality condition of L(u, ), we obtain the following system of equations: u ( ) runs forward (backward) in time: 8 > > > < > > > : ut @x ( (x)@xu (x, t)) = 0, u(x, 0) = u0(x), t + @x ( (x)@x (x, t)) = 0, (x, T) = 0. 5 17
  14. Finite di↵erence scheme for variable coe cient type heat equation

    The discretized min-max problem (1) takes the following formulation: min u2{ul j } max 2{ l j } L(u, ), where L(u, ) = hthx X 0lNt 1 1jNx l j ul+1 j ul j ht Lap(ul+1)j ! . The discretized saddle point of the min-max problem is: 8 > > > > > > > < > > > > > > > : ul+1 j ul j ht Lap(ul+1)j = 0, 0  l  Nt 1, 1  j  Nx, u0 j = u0(xj), 1  j  Nx, l j l 1 j ht + Lap( l 1)j = 0, 1  l  Nt, 1  j  Nx, Nt j = 0, 1  j  Nx. Both equations are discretized with implicit backward Euler method. 6 18
  15. A primal-dual algorithm We adapt the primal-dual hybird gradient (PDHG)

    algorithm for convex optimization problem, and integrate General-prox PDHG and nonlinear PDHG. The n-th iteration is taken as follows: 8 > > > < > > > : un = argmin hu, AT ˜n 1i + 1 2⌧u ku un 1k2 L2 , n = argmax hAun, i 1 2⌧ k n 1k2 H , ˜n = 2 n n 1. Here A denotes the discrete approximation of the di↵erential operator: Au ⇡ @tu @x ( (x)@xu) , AT ⇡ t + @x ( (x)@x ) . Each update can be written explicitly, un = un 1 ⌧u ⇣ (@t + ↵@x) ˜n 1 + @x ⇣ @x ˜n 1 ⌘⌘ , n = n 1 ⌧ @tt + ↵2@xx ˆ2@xxxx 1 ((@t + ↵@x) un @x( @xun)) . The computation for n can be done by using Fast Fourier Transform (FFT). 7 19
  16. Example 1: variable coe cient type heat equation ( @tu

    = @x ( @xu) , (x, t) 2 [0, 1] ⇥ [0, 0.1], u(x, 0) = exp ( 64(x 0.5)2), for x 2 [0, 1], We record number of iterations needed to achieve certain errors ✏ for the primal-dual approach for di↵erent mesh in the following table: Nt ⇥ N 64 ⇥ 16 128 ⇥ 32 256 ⇥ 64 512 ⇥ 128 1028 ⇥ 256 ✏ = 10 6 90 91 92 94 102 ✏ = 10 10 144 146 147 149 163 I For an explicit scheme, we need to satisfy a restrictive CFL condition, i.e., ht (hx)2 < 1 2 maxx (x) . However, in the above examples we have ht (hx)2 = 128 5 , 256 5 , ..., 2048 5 . I We can see from above that the convergence of the algorithm for this linear equation is independent of the grid size. 8 20
  17. Example 2: quadratic conservation law Consider a quadratic conservation law

    of the following form: @tu + @x(↵u2 + u) = 0, for (x, t) 2 [0, 2] ⇥ [0, 1], u(x, 0) = ( 0.1, if 1  x  2, 0.25 else. For the numerical scheme, we use backward Euler and discontinuous Galerkin method. Figure: Here ↵ = 1, = 1, we have the tra c equation. Left: numerical solution of the tra c equation with discontinuous initial data, N ⇥ Nt = 256 ⇥ 32. Right: numerical solution at time t = 0, 0.5, 1. The shock propagation (left) and development of rarefaction wave (right). 9 21
  18. Example 3: Barotropic Compressible Navier–Stokes equation (BNS) Consider a BNS

    equation of the following form: 8 > < > : @t⇢ + @xm = 0, @tm + @x( m2 ⇢ ) + @xP(⇢) = @x(µ(⇢)@x m ⇢ ), Here, ⇢: density; m := ⇢v: flx; v: velocity. We set pressure P(⇢) = c1⇢2, µ(⇢) = c2 . For the numerical scheme, we use backward Euler and Lax–Friedrichs type of scheme. Figure: From left to right: ⇢, m, v. With positive uniform initial velocity start at t = 0, the density moves towards right. 10 22
  19. Work in progress: min-max with entropy regularizer Inspired by the

    Brenier’s work12, we incorporate the entropy as a convex regularizer term to our min-max formulation of IVP, and apply PDHG to solve the new saddle point problem. min u,u(·,0)=u0 max {L(u, ) + E(u)} I Burger’s equation: E(u) = RR u2dxdt I Euler equation of isothermal fluids:E(⇢, m) = RR ⇣ m2 2⇢ + ⇢ log ⇢ ⌘ dxdt Figure: The residuals converge significantly faster with entropy regularizers. 1Variational Methods for Evolution, Workshop Report 2017,54 2initial value problem for the Euler equations of incompressible fluids viewed as a concave maximization problem.Communications in Mathematical Physics 364 (2018): 579-605. 11 23
  20. A first-order computational algorithm for reaction-di↵usion type equations via primal-dual

    hybrid gradient method Shu Liu, Siting Liu, Stanley Osher, Wuchen Li May, 2023 24
  21. Introduction I Goal We propose an easy-to-implement iterative method for

    resolving the implicit (or semi-implicit) schemes arising in solving reaction-di↵usion (RD) type equations (or system). I Our treatment – Formulate as a min-max saddle point problem; – Apply the primal-dual hybrid gradient (PDHG) method. – Suitable precondition matrices can be applied to accelerate the convergence of algorithms. I The method is applicable to various discrete numerical schemes with high flexibility. I The method can e ciently produce numerical solutions with su cient accuracy. 2 25
  22. Formulations I Reaction-di↵usion (RD) type equation @u(x, t) @t =

    Lu(x, t) + Ru(x, t) on ⌦ ⇢ Rd, (1) with initial condition u(·, 0) = u0, L is a certain elliptic operator such as a harmonic or biharmonic operator depicting the di↵usion process; R is a certain nonlinear operator depicting the reaction process. I The semi-discrete implicit scheme at the t-th time step is ut+1 ut ht = Lut+1 + Rut+1. (2) Given ut, we want to solve for ut+1. 3 26
  23. Methods Suppose L, R is the space discretization of L,

    R. The previous scheme reduces to the following nonlinear equation F(U) = 0, where F(U) = U Ut ht(LU + R(U)) = 0. U 2 RN , F : RN ! RN , where N is the number of discretization of the space. Idea: I Solving F(U) = 0 is equivalent to minimizing ◆(F(U)), where ◆(x) = 0 i↵ x = 0 and ◆(x) = +1 i↵ x 6= 0. I Notice that ◆(x) = supp2RN {p>x}. I Thus, min {◆(F(U))} is equivalent to solving the saddle point problem min U2RN max P 2RN {P>F(U)}. 4 27
  24. Methods We apply the following dynamic for solving the previous

    saddle point problem in order to update the numerical solution Ut at time t to Ut+1. Such a dynamic is inspired by the G prox Primal-Dual Hybrid Gradient (PDHG) method. Initial condition: U0 = Ut, P0 = 0; Pn+1 = Pn + ⌧pG 1F(Un); e Pn+1 = 2Pn+1 Pn; Un+1 = Un ⌧u rU F(Un) > e Pn+1. I ⌧p, ⌧u are stepsizes of our method. They usually take values in [0.5, 1]; I The N ⇥ N matrix G plays an important role here. It can be treated as the preconditioner that may dramatically improve the convergence speed of the method. In our experience, G is usually chosen as (I htL)(I htL)> . The inversion of G can be handled with FFT or DCT e ciently. The computational complexity of evolving this dynamic at each iteration is O(N log N). 5 28
  25. Example I (Cahn-Hilliard equation with seven circles) I We consider

    the Cahn-Hilliard equation on a periodic region ⌦ = [0, 2⇡)2 that takes the following form. @u(x, t) @t = a u(x, t) + b W0 (u(x, t)), u(·, 0) = u0. Here we assume a = 0.12, b = 1, and the double-well potential function W(u) = 1 2 (u2 1)2. u0 is chosen as the seven-circle function shown in the figures on the next page. I We will solve the equation on the time interval [0, 30]. In our numerical implementation, we set space discretization as 128 ⇥ 128; time discretization number as 6000 with ht = 1/200. For the PDHG iteration, we set ⌧u = ⌧p = 0.5. 6 29
  26. Example I (Cahn-Hilliard equation with seven circles) (a) t =

    0.0 (b) t = 5.0 (c) t = 15.0 (d) t = 25.0 (e) t = 30.0 Figure: Numerical solution and log10 Res(Un) plot at di↵erent time stages for the seven circle example. The residual plots indicate the linear convergence of our method for the nonlinear objective functions used in this example. Here Res(Un) = kUn Ut ht (LUn + R(Un))k2 . 7 30
  27. Example II (A higher order equation ) I We test

    the method on the following 6th-order Cahn-Hilliard-type equation which originates from the research in functionalized polymer. @u(x, t) @t = (✏2 W00 (u)+✏2 )(✏2 u W0 (u)) On ⌦, u(·, 0) = u0. I We set ⌦ = [0, 2⇡)2, ✏ = 0.18. W(u) is the same as before. I We choose the initial condition u0(x, y) = 2esin x+sin y 2 + 2.2e sin x sin y 2 1. I In the numerical implementation, we solve the equation on the time interval [0, 20]. We choose space discretization as 128 ⇥ 128, and time discretization number as 20000 with ht = 1/1000. I For the PDHG iteration, we set stepsizes ⌧u = ⌧p = 0.58. 8 31
  28. Example II (A higher order equation) (a) t = 0.0

    (b) t = 0.1 (c) t = 2.0 (d) t = 20.0 Figure: Numerical solution at di↵erent time stages (a) Residual decay t = 2.0 (b) Residual decay t = 20.0 (c) Residual plot t = 2.0 (d) Residual plot at t = 20.0 Figure: Decay of log10 residue term & plots of residual functional Res(U n) at di↵erent time stages . Here Res(Un) = kUn Ut ht (LUn + R(Un))k2. 9 32
  29. Primal-Dual hybrid gradient algorithms for computing time implicit Hamilton-Jacobi equations

    Tingwei Meng+, Wenbo Hao+, Siting Liu+, Liu Yang+, Wuchen Liú, Stanley Osher+ úDepartment of Mathematics, University of South Carolina +Department of Mathematics, UCLA Aug 10, 2023 Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 1 / 8 33
  30. Motivation Solving the following Hamilton-Jacobi (HJ) PDE Y ] [

    ˆ„(x, t) ˆt + H(x, t, Ò x „(x, t)) = ‘ x „(x, t), x œ , t > 0, „(x, 0) = g(x), x œ , (1) Grid based method (e.g., Lax–Friedrichs, Engquist-Osher scheme) needs to satisfy the CFL condition in discretization. In this project, we provide an optimization method for solving HJ PDEs (with certain form of H) with implicit time discretization. This allows us to choose larger time stepsize and avoid CFL condition. In the literature, there are some optimization algorithms for solving HJ PDE [Darbon, Dower, Osher, Yegerov, . . . ]. Compared to these methods, the optimization algorithm proposed in this project can handle a more general Hamiltonian that depends on (x, t). Partial theoretical guarantee is provided. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 2 / 8 34
  31. Continuous formula Solving the original HJ PDE (1) is equivalent

    to the following optimization problem min „ satisfying (1) ≠c ⁄ „(x, T)dx, (2) where c œ R is a constant used to stabilize the method. Introducing the Lagrange multiplier fl, we get min „:„(·,0)=g max fl ⁄ T 0 ⁄ fl(x, t) 1 ˆ„(x, t) ˆt + H(x, t, Ò x „(x, t)) ≠ ‘ x „(x, t) 2 dxdt ≠ c ⁄ „(x, T)dx. (3) Under certain condition (e.g., H(x, t, ·) is convex for any x, t), this is equivalent to the saddle point problem min „:„(·,0)=g max fl,v:flØ0 ⁄ T 0 ⁄ fl(x, t) 3 ˆ„(x, t) ˆt + Èv(x, t), Ò x „(x, t)Í ≠ H ú(x, t, v(x, t)) ≠‘ x „(x, t) 4 dxdt ≠ c ⁄ „(x, T)dx, (4) where H ú(x, t, ·) is the Fenchel-Legendre transform of H(x, t, ·) for any x, t. We discretize and then employ the PDHG method to solve this saddle point problem. We use implicit time discretization to avoid the CFL condition. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 3 / 8 35
  32. Continuous formula First order optimality condition Y _ _ _

    _ _ ] _ _ _ _ _ [ ˆ„(x, t) ˆt + Èv(x, t), Ò x „(x, t)Í ≠ H ú(x, t, v(x, t)) = ‘ x „(x, t), x œ , t > 0, ˆfl(x, t) ˆt + Ò x · (fl(x, t)v(x, t)) + ‘ x fl(x, t) = 0, x œ , t > 0, v(x, t) = Ò pH(x, t, Ò x „(x, t)), x œ , t > 0, „(x, 0) = g(x), fl(x, T) = c, x œ , (5) In PDHG method, the updates for „ and fl are related to the first (linear wrt „) and second line (linear wrt fl), respectively. We use implicit time discretization We can have a larger CFL number ( t x max p ÎÒ pH(x, t, p)Î), e.g., in example 2, CFL number is 0.036 for explicit scheme, but it can be about 10.8 in our method. Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 4 / 8 36
  33. Numerical results: example 1 Solving Burgers equation for = [0,

    2]n (n = 1 or 2) with periodic spatial condition and initial condition „(x, 0) = 1 2 Îx ≠ 1Î2. We see a first order error decay in this example. nx ◊ nt 20 ◊ 11 40 ◊ 21 80 ◊ 41 160 ◊ 81 Averaged absolute residual of HJ PDE 9.36e-07 9.75e-07 9.86e-07 9.95e-07 ¸1 relative error 5.28e-02 2.69e-02 1.13e-02 2.71e-03 Table: Error table: averaged absolute residual of discretized equation, and ¸1 relative error nx ◊ ny ◊ nt 20 ◊ 20 ◊ 11 40 ◊ 40 ◊ 21 80 ◊ 80 ◊ 41 160 ◊ 160 ◊ 81 Averaged absolute residual of HJ PDE 9.65e-07 9.84e-07 9.99E-07 1.00E-06 ¸1 relative error 5.21e-02 2.68e-02 1.12E-02 2.71E-03 Table: Error table: averaged absolute residual of discretized equation, and ¸1 relative error Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 5 / 8 37
  34. Numerical results: example 2 We solve the following viscous HJ

    PDE whose Hamiltonian depends on both the spatial variable x and the time variable t (for n = 1, 2): Y _ ] _ [ ˆ„(x, t) ˆt + 1 2 ÎÒ x „(x, t)Î2 ≠ f (x, t) = 0.1 x „(x, t), x œ [0, 2]n, t > 0, „(x, 0) = ≠ 1 10 Îx ≠ 1Î2, x œ [0, 2]n, (6) where f (x, t) = 1 2 min{(x1 ≠ t ≠ 0.5)2, (x1 ≠ t + 1.5)2, (x1 ≠ t ≠ 2.5)2} + q j>1 1 4 (xj ≠ 1)2. One-dimensional result (nt = 5, nx = 80, we show several level sets of the solution): Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 6 / 8 38
  35. Numerical results: example 2 Two-dimensional result (nt = 5, nx

    = ny = 80, we show several level sets of the solution „(·, t) for certain t): (a) t = 0.25 (b) t = 0.5 (c) t = 0.75 (d) t = 1.0 Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 7 / 8 39
  36. Conclusion In this project, we propose an optimization algorithm for

    solving HJ PDE whose Hamiltonian H could depend on (x, t), but we require H(x, t, ·) to be convex for any (x, t) We apply the implicit time discretization. As a result, we can choose a large time step and avoid the CFL condition Partial theoretical guarantees are provided It may be combined with higher order scheme, e.g., used as an initialization for more accurate scheme This idea may also be generalized to solving stationary HJ PDEs Meng, Hao, Liu, Yang, Li, Osher PDHG HJ Aug 10, 2023 8 / 8 40
  37. Introduction In convex optimization, we seek to find arg min

    x2Rd f(x) . If the function is convex and di↵erentiable, this is equivalent to finding x⇤ 2 Rd such that rf(x⇤) = 0 . This can be rewritten in a variational formulation: inf x2Rd sup p2Rd hp, rf(x)i , (0.1) which becomes a saddle point problem. 2 42
  38. Formulations We add a regularization term in (0.1): inf x2Rd

    sup p2Rd hrf(x), pi " 2 kpk2 , (0.2) where " > 0 is a constant. This regularization term further drives p to 0. Proposition Let f : Rd ! R be a C2 strongly convex function. Then the saddle point to (0.2) is the unique global minimum of f and p⇤ = 0. 3 43
  39. Algorithm Algorithm Linearized Primal-Dual Damping Algorithm Initial guesses x0 2

    Rd, p0 2 Rd; Stepsizes ⌧ > 0, > 0; Parameters A > 0, " > 0, ! > 0, C 0. while n = 1, 2, · · · , not converge do pn+1 = 1 1+ "A pn + A 1+ "A rf(xn); ˜ pn+1 = pn+1 + !(pn+1 pn); xn+1 = xn ⌧C(xn)˜ pn+1; end Here C(x) can usually be chosen to have a simple structure. For example, C(x) can be the identity matrix or a diagonal matrix. This makes our algorithm a first-order algorithm. 4 44
  40. Convergence Theorem (Informal) Let f : Rd ! R be

    a C4 strongly convex function. Let x⇤ be the global minimum of f and p⇤ = 0. Then, the iteration (xn, pn) produced by Algorithm (1) converges to the saddle point (x⇤, p⇤) if ⌧, , are small enough. Moreover, kpnk2 + krf(xn)k2  (kp0k2 + krf(x0)k2)(1 µ2 M + )n, where µ = minx min(r2f(x)C(x)), C(x) = B(x)r2f(x), > 0 depends on the initial condition, and M > 0 depends on C(x)T r3f(x)rf(x) + (r2f(x))2 C(x), ⌧, , A, ", and !. Remark Our algorithm is a ne in rf(x), yet our theorem indicates that krf(x)k2 converges to 0. 5 45
  41. Strongly convex functions Regularized log-sum-exp function: f(x) = log n

    X i=1 exp(qT i x) ! + 1 2 xT Qx , where n = 100, Q = QT 0 and qT i is the ith row of Q. Q is chosen to be diagonally dominant, i.e. Qi,i > P j6=i |Qi,j |. C(x) in our algorithm can be chosen as either the identity matrix or a diagonal matrix with Q 1 i,i on the diagonal. Quadratic minus cosine: f(x) = kxk2 cos(cT x) , where c is a vector in R100 with kck2 = 1.9. 6 46
  42. Strongly convex functions (a) Regularized log-sum-exp (b) Qudratic minus cosine

    Figure: Comparison of gradient descent, NAG, PDD, and IGAHD-SC (we use ‘att’ as a shorthand for this method) on minimizing (a) the regularized log-sum-exp function and (b) the quadratic minus cosine function. The y-axis represents the 2-norm of the gradient of the objective function on a logarithmic scale. The x-axis represents the number of iterations on a logarithmic scale. 7 47
  43. Non-convex functions Rosenbrock function: f(x, y) = (a x)2 +

    b(y x2)2 , where a common choice of parameters is a = 1, b = 100. Ackley function: f(x, y) = 20 exp 0.2 p 0.5(x2 + y2) exp ⇥ 0.5 cos(2⇡x) + cos(2⇡y) ⇤ + e + 20 , 8 48
  44. Rosenbrock (a) Convergence comparison (b) Optimization trajectories Figure: Minimizing the

    Rosenbrock function with gradient descent, NAG, PDD with C(x) = I and IGAHD (‘att’). The left panel shows the convergence speed of each method. The right panel shows the optimization trajectories of each method. 9 49
  45. Ackley (a) Convergence comparison (b) Optimization trajectories Figure: Minimizing the

    Ackley function with gradient descent, NAG, PDD and IGAHD (‘att’). The left panel shows the convergence speed of each method. The right panel shows the optimization trajectories of each method. 10 50
  46. MNIST with two-layer NN Figure: Training a two-layer neural network

    with the MNIST data set using gradient descent, NAG, PDD, Adam, and IGAHD (‘att’). The left panel shows the convergence speed of training loss. The right panel shows the test accuracy of each method. The x-axis represents the number of iterations in terms of mini-batches. 11 51
  47. CIFAR 10 with CNN Figure: Training loss (left panel) and

    test accuracy (right panel) of a convolutional neural network on the CIFAR10 data set. The x-axis represents the number of iterations in terms of mini-batches. 12 52
  48. High-order computation of generalized optimal transport, mean field control problems,

    and variational time implicit schemes Guosheng Fu, Will Pazner, Siting Liu, Stanley Osher, Wuchen Li arXiv:2302.02308, arXiv:2303.08950, arXiv:2306.06287 February/March/June 2023 53
  49. Introduction: Entropy dissipation I Reaction-di↵usion system with M species and

    R reactions: @t⇢i = r· ✓ V1,i(⇢i)r ⇢ Ei(⇢i) ◆ R X p=1 V2,p(⇢) i,p M X j=1 j,p ⇢ Ej(⇢j), where Ei is a given energy functional. I Energy dissipation law: d dt M X i=1 Ei(⇢i) = I(⇢)  0, with the generalized Fisher information functional I(⇢) := Z ⌦ h M X i=1 kr Ei ⇢ k2V1,i(⇢i) + R X p=1 M X j=1 j,p Ej ⇢ 2 V2,p(⇢) i dx. 2 54
  50. Generalized optimal transport distances DistV1,V2 (⇢ 0, ⇢ 1)2 =

    inf ⇢,m,s n Z T 0 Z ⌦ M X i=1 |mi |2 V1,i(⇢i) + R X p=1 |sp |2 V2,p(⇢) ! dxdt : @t⇢i + r · mi = PR p=1 i,psp, 81  i  M, mi · ⌫|@⌦ = 0, ⇢(0, ·) = ⇢0, ⇢(T, ·) = ⇢1. o , where ⇢ = (⇢1, · · · , ⇢M ) is the collection of densities, m = (m1, · · · , mM ) is the collection of fluxes, and s = (s1, · · · , sR) is the collection of sources. 3 55
  51. Mean field control problems and JKO-type schemes I The generalized

    distance introduces a potential mean-field control problem: inf ⇢,m,s n Z T 0 Z ⌦ M X i=1 |mi |2 V1,i(⇢i) + R X p=1 |sp |2 V2,p(⇢) ! dxdt Z T 0 F(⇢)dt + G(⇢(T, ·)), : @t⇢i + r · mi = PR p=1 i,psp, 81  i  M, mi · ⌫|@⌦ = 0, ⇢(0, ·) = ⇢0, ⇢(T, ·) = ⇢1. o , for mobility functions V1 , V2 , initial density ⇢0, potential functional F and terminal energy G. I Special choices of mean field control problems form a variational time implicit scheme (JKO-type). E.g., F(⇢) = 0, G(⇢) = M X i=1 E(⇢i). 4 56
  52. Saddle problems for generalized mean field control Saddle-point reformulation of

    the variational MFC problem: Find the critical point of inf u,⇢T sup Z T 0 Z ⌦ h H(u) u · D( ) i dxdt + Z ⌦ h G(⇢T ) + ⇢T · (T, ·) ⇢ 0 · (0, ·) i dx. Here u := (⇢, m, s, n), ⇢T := (⇢1,T , · · · , ⇢M,T ), := ( 1, · · · , M , 1, · · · , M ), initial data ⇢0 := (⇢0 1 , · · · , ⇢0 M ), the nonlinear function H(u) := M X i=1 ✓ kmi k2 2V1,i(⇢i) + |ni |2 2V3,i(⇢i) ◆ + R X p=1 |sp |2 2V2,p(⇢) F (⇢), and the operator D( ) := (@t + r · , r , T , ). 5 57
  53. Methods I Space-time high-order finite elements: Find the critical point

    of the fully discrete system inf u h 2W k 1 h , ⇢T,h 2Mk 1 h sup h 2V k h hhH(uh ) uh · D( h)iih + hG(⇢T,h)ih + h⇢T,h · h(T, ·) ⇢ 0 · h(0, ·)ih. I G-prox primal-dual hybrid gradient algorithm: (1) -update: `+1 h = argmin h 1 2 hh|D( h ` h )|2iih + hhu ` h · D( h)iih h⇢ ` T,h · h(T, ·) ⇢ 0 · h(0, ·)ih. (2) Extrapolation: ˜ `+1 h = 2 `+1 h ` h . (3) u, ⇢-update: u `+1 h = argmin uh 1 2 u hh|uh u ` h |2iih + hhH(uh ) uh · D( ˜ `+1 h )iih , ⇢ `+1 T,h = argmin ⇢T,h 1 2 ⇢ h|⇢T,h ⇢ ` T,h )|2ih + hG(⇢T,h ) + ⇢T,h · ˜`+1 h (T, ·)ih . 6 58
  54. Properties Advantages of high-order methods over low-order methods: I E

    ciency: more e cient per degree of freedom. I Accuracy: more accuracy per degree of freedom. G-prox PDHG Algorithm is highly scalable: MPI+GPU. I -update is a global constant-coe cient-type (space-time) high-order di↵usion solve. High-performance, scalable solvers are available. E.g., multigrid with PCG achieves optimal computational complexity. I u, ⇢-update is a pointwise nonlinear single-value optimization problem. Embarrassingly parallelizable and optimal computational complexity. 7 59
  55. Example I: Initial value problem The PDE: @t⇢ 1@x0x0 ⇢

    2@x1x1 ⇢ = µ⇢(1 ⇢). Di↵usion parameters 1 = 0.1, 2 = 0.01, and µ > 0 is the reaction coe cient to be specified. The initial condition is a flat-top Gaussian: ⇢0(x0, x1) = ( 1, if x2 0 + 4x2 1  0.25 exp( 10(x2 0 + 4x2 1 0.25)), otherwise The computational domain is a rectangle ⌦ = [ 2, 2] ⇥ [ 1, 1], which is discretized with a 32 ⇥ 16 square mesh. We use polynomial degree k = 4 We take time step size t = 0.1, and the final time is T = 4. 8 60
  56. 61

  57. Exact score-based generative models (SGM) via regularized Wasserstein proximal (ongoing

    work) I Joint with B. Zhang, S. Liu, M. Katsoulakis, S. Osher, W. Li. I Formulate SGM models via regularized Wasserstein proximal operators with an explicit kernel formula. I Exact score functions calculated via kernel formulas. References • W. Li, S. Liu, S. Osher. A kernel formula for regularized Wasserstein proximal operators. 2023. • B. Zhang and M.A. Katsoulakis. A mean-field games laboratory for generative modeling. 2023. 1 64
  58. Exact score-based generative models (SGM) via regularized Wasserstein proximal Figure:

    Experiments were conducted on dataset of 2D toy models. We began with 200 initial data points, applying a forward di↵usive process. Subsequently, a reverse di↵usive process was performed using the score function directly computed via the kernel formula to generate new samples. 2 65