Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Oliver Tse (Eindhoven University of Technology,...

Jia-Jie Zhu
April 12, 2024
170

Oliver Tse (Eindhoven University of Technology, Netherlands) Variational Acceleration Methods in the Space of Probability Measures

WORKSHOP ON OPTIMAL TRANSPORT
FROM THEORY TO APPLICATIONS
INTERFACING DYNAMICAL SYSTEMS, OPTIMIZATION, AND MACHINE LEARNING
Venue: Humboldt University of Berlin, Dorotheenstraße 24

Berlin, Germany. March 11th - 15th, 2024

Jia-Jie Zhu

April 12, 2024
Tweet

More Decks by Jia-Jie Zhu

Transcript

  1. Accelera ti ng Op ti miza ti on over Probability

    Measures Oliver Tse Op ti mal Transport from Theory to Applica ti ons Humboldt Universität, Berlin Joint work with Shi Chen, Qin Li, Stephen J. Wright
  2. The Context Task: Given a -convex objec ti ve λ

    ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Gradient descent · xt = − ∇ℰ(xt ) Possible Approaches 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 { ℰ(xt ) − ℰ(x* ) ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ(xt ) − ℰ(x* ) ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 λ ≪ 1
  3. The Context Possible Approaches Task: Given a -convex objec ti

    ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = e−γtvt , · vt = − eγt ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. Polyak 1964, A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  4. The Context Possible Approaches Task: Given a -convex objec ti

    ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = vt , · vt = − γvt − ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  5. The Context Possible Approaches Nesterov method · xt = 2

    t vt , · vt = − 2 t vt − t 2 ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(t−2) 𝗂𝖿 λ = 0 Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  6. The Context Possible Approaches Varia ti onal accelera ti on

    · xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt )) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Nesterov method αt = log(2/t), βt = 2 log(t/2) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  7. The Context Possible Approaches Varia ti onal accelera ti on

    𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt ))
  8. The Context Possible Approaches Varia ti onal accelera ti on

    𝒳 = ℝd, 𝖽 = Euclidean Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = vt , · vt = − vt − eβt ∇ℰ(xt ) · βt ≤ 1 ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0,
  9. The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) O

    tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 { ℰ[ρt ] − ℰ[ρ* ] ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  10. The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Heavy

    ball? Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Nesterov? Varia ti onal accelera ti on? Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
  11. II. Convergence results The Agenda I. Hamiltonian fl ows IV.

    Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) IV. Summary + Outlook
  12. The Agenda II. Convergence results IV. Numerical experiments Convergence for

    Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl ows IV. Summary + Outlook
  13. Towards Hamiltonian fl ows O tt o gradient fl ow

    ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 dXt = − ∇ℰ′  [ρt ](Xt ) dt Lagrangian dynamics · xt = − ∇ℰ(xt ) ρt = 𝖫 𝖺 𝗐 Xt
  14. dXt = Vt dt dVt = − Vt dt −

    eβt ∇ℰ′  [ρt ](Xt ) dt · xt = vt · vt = − vt − eβt ∇ℰ(xt ) Towards Hamiltonian fl ows O tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Lagrangian dynamics ρt = 𝖫 𝖺 𝗐 Xt Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Eg. Ambrosio-Gangbo 2007 Kine ti c Vlasov
  15. Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

    𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 jt (dx) = ∫ v μt (dxdv) ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t mt + divx ( 1 ρt mt ⊗ mt) = − mt − ρt Gt [ρt ] 1) Restricted class of ℰ Eg. Carillo-Choi-Zatorska 2016 Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) Pressureless Euler
  16. Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

    𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ut + ut ⋅ divx ut = − ut − Gt [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Pressureless Euler jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)
  17. Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

    𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Eg. Chow-Li-Zhou 2020, Wang-Li 2022 Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ψt + |∇x ψt |2 = − ψt − eβt ℰ′  [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Hamilton-Jacobi ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt
  18. The Agenda II. Convergence results IV. Numerical experiments Convergence for

    Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) I. Hamiltonian fl IV. Summary + Outlook
  19. Convergence on ℝd Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016,

    Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Idea: If for all , then ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) < + ∞ t ≥ τ eβt(ℰ(xt ) − ℰ(x* )) ≤ ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) ∀ t ≥ τ ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Result: · xt = vt · vt = − vt − eβt ∇ℰ(xt )
  20. Towards 𝒫 2 (ℝd) Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan

    2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
  21. = 1 2 |x − x* |2 + Lyapunov func

    ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) ⟨x − x* , v⟩ + 1 2 |v|2 Towards 𝒫 2 (ℝd) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
  22. = 1 2 |x − x* |2 + Lyapunov func

    ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) 1 2 d dt |x − x* |2 + 1 2 |v|2 Towards 𝒫 2 (ℝd) 𝕎 2 2 (ρ, ρ* ) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
  23. Lyapunov func ti on: ℒt (μ) := 1 2 𝕎

    2 2 (ρ, ρ* ) + 1 2 d dt 𝕎 2 2 (ρ, ρ* ) + ∬ |v|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Claim: = 1 2 ∬ |x + v − 𝖳 (x)|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ≥ 0 Towards 𝒫 2 (ℝd) μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Ques ti on: d dt ℒt (μt ) ≤ 0 ?? dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′  [ρt ](Xt ) dt
  24. Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫

    𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 1st Order Calculus ∂t ρt + divx jt = 0 1 2 d dt 𝕎 2 2 (ρt , σ) = ∬ ⟨x − y, djt dρt (x)⟩ πt (dxdy) For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) = ∫ ⟨x − 𝖳 t (x), jt (dx)⟩ = ∬ ⟨x − 𝖳 t (x), v⟩ μt (dxdv) Eg. Ambrosio-Gigli-Savaré 2005 jt (dx) = ∫ v μt (dxdv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)
  25. Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫

    𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 2nd Order Calculus ∂t ρt + divx jt = 0 For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) 1 2 d+ dt d dt 𝕎 2 2 (ρt , σ) ≤ ∬ |v|2 μt (dxdv) − ∬ ⟨x − 𝖳 t (x), v + Gt [ρt ](x)⟩ μt (dxdv) Eg. Carrillo-Choi-Tse 2018, Chen-Li-Tse-Wright (arXiv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)
  26. Some -calculus 𝕎 2 ℰ[σϑ ] ≤ (1 − ϑ)ℰ[σ0

    ] + ϑℰ[σ1 ] − λ 2 ϑ(1 − ϑ) 𝕎 2 2 (σ0 , σ1 ) Recall: is -convex if for every geodesic ℰ λ σ: [0,1] → 𝒫 2 (ℝd) ℰ[ρt ] − ℰ[σ] − ∫ ⟨∇ℰ′  [ρt ](x), x − 𝖳 t (x)⟩ ρt (dx) + λ 2 𝕎 2 2 (ρt , σ) ≤ 0 In par ti cular, for any : σ ∈ 𝒫 (ℝd)
  27. Convergence on 𝒫 2 (ℝd) Proof: Claim: d+ dt ℒt

    (μt ) = · βt eβt(ℰ[ρt ] − ℰ[ρ* ]) −eβt ∬ ⟨x − 𝖳 t (x), ∇ℰ′  [ρt ](x)⟩ ρt (dx) ≤ eβt [ℰ[ρt ] − ℰ[ρ* ] − ∬ ⟨x − 𝖳 t (x), ∇ℰ′  [ρt ](x)⟩ ρt (dx)] ≤ 0 Conclude as in the case. ℝd The temporal deriva ti ve of yields ℒt (μt ) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′  [ρt ](Xt ) dt
  28. The Agenda II. Convergence results IV. Numerical experiments Convergence for

    Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook
  29. Experimental setup Gradient fl ow (GF) dXt = − ∇ℰ′

     [ρt ](Xt ) dt Nesterov (Nes) αt = log(2/t), βt = 2 log(t/2) Exponen ti al (Exp) αt = 0, βt = t dXt = eαt Vt dt dVt = − eαt(Vt + eβt ∇ℰ′  [ρt ](Xt )) dt dXt = e−γtVt dt dVt = − eγt ∇ℰ′  [ρt ](Xt ) dt Varia ti onal accelera ti on Heavy-ball fl ow (HBF) γ = 1/2
  30. Experiment A1 0 2 4 6 8 10 t 10°2

    10°1 100 101 102 E(Ωt ) Exp 0 20 40 60 t 10°3 10°2 10°1 100 101 102 103 104 105 E(Ωt ) ° E§ GF HB Nes Exp ℰ[ρ] = ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 500 b ∼ Normal(0, 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1])
  31. Experiment A2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx)

    + ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 20 b ∼ Normal(0, 10 ⋅ 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1]) N = 1600 0 5 10 15 20 25 t 10°4 10°2 100 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 102 103 E(Ωt ) ° E§ GF HB Nes Exp
  32. 0 20 40 60 t 1.02 £ 102 1.03 £

    102 1.04 £ 102 1.05 £ 102 1.06 £ 102 E(Ωt ) Nes Exp 0 20 40 60 t 10°5 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B1 ℰ[ρ] = ∫ VB (x) ρ(dx) VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) Dimension, d = 50 qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )
  33. 0 5 10 15 20 25 t 3.25 £ 101

    3.3 £ 101 3.35 £ 101 3.4 £ 101 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx) + ∫ VB (x) ρ(dx) Dimension, d = 10 N = 1600 VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )
  34. 0 5 10 15 20 25 t 10°3 10°2 10°1

    100 E(Ωt ) GF HB Nes Exp 1.0 Experiment C1 ℰ[ρ] = 1 2 500 ∑ j=1 | f(xj ) − g[ρ](xj )|2 g[ρ](x) = ∫ [α ReLU(w ⋅ x + b) + β] ρ(dz) Dimension, d = 4 z = (α, β, w, b) ∈ ℝ4 f(x) = sin(πx) xj ∼ Unif([−1,1]) N = 100
  35. The Agenda II. Convergence results IV. Numerical experiments Convergence for

    Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook
  36. Summary Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt)

    λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Varia ti onal accelera ti on · xt = vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′  [ρt ](Xt ) dt
  37. Outlook Varia ti onal accelera ti on · xt =

    vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′  [ρt ](Xt ) dt Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Discrete- ti me convergence Sampling ℰ[ρ] = KL(ρ|π) No well-posedness! dXt = Vt dt dVt = − γVt dt − ∇log π(Xt ) dt + 2σ dBt Thank you! Heavy-ball fl ow Eg. Bolley-Guillin-Malrieu 2010, Klar-Kreusser-Tse (2017)