Jia-Jie Zhu
April 12, 2024
93

# Oliver Tse (Eindhoven University of Technology, Netherlands) Variational Acceleration Methods in the Space of Probability Measures

WORKSHOP ON OPTIMAL TRANSPORT
FROM THEORY TO APPLICATIONS
INTERFACING DYNAMICAL SYSTEMS, OPTIMIZATION, AND MACHINE LEARNING
Venue: Humboldt University of Berlin, Dorotheenstraße 24

Berlin, Germany. March 11th - 15th, 2024

April 12, 2024

## Transcript

1. ### Accelera ti ng Op ti miza ti on over Probability

Measures Oliver Tse Op ti mal Transport from Theory to Applica ti ons Humboldt Universität, Berlin Joint work with Shi Chen, Qin Li, Stephen J. Wright
2. ### The Context Task: Given a -convex objec ti ve λ

ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Gradient descent · xt = − ∇ℰ(xt ) Possible Approaches 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 { ℰ(xt ) − ℰ(x* ) ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ(xt ) − ℰ(x* ) ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 λ ≪ 1
3. ### The Context Possible Approaches Task: Given a -convex objec ti

ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = e−γtvt , · vt = − eγt ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. Polyak 1964, A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
4. ### The Context Possible Approaches Task: Given a -convex objec ti

ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = vt , · vt = − γvt − ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
5. ### The Context Possible Approaches Nesterov method · xt = 2

t vt , · vt = − 2 t vt − t 2 ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(t−2) 𝗂𝖿 λ = 0 Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
6. ### The Context Possible Approaches Varia ti onal accelera ti on

· xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt )) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Nesterov method αt = log(2/t), βt = 2 log(t/2) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
7. ### The Context Possible Approaches Varia ti onal accelera ti on

𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt ))
8. ### The Context Possible Approaches Varia ti onal accelera ti on

𝒳 = ℝd, 𝖽 = Euclidean Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = vt , · vt = − vt − eβt ∇ℰ(xt ) · βt ≤ 1 ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0,
9. ### The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) O

tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′ ￼ [ρt ]) Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 { ℰ[ρt ] − ℰ[ρ* ] ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
10. ### The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Heavy

ball? Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Nesterov? Varia ti onal accelera ti on? Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳
11. ### II. Convergence results The Agenda I. Hamiltonian fl ows IV.

Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) IV. Summary + Outlook
12. ### The Agenda II. Convergence results IV. Numerical experiments Convergence for

Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl ows IV. Summary + Outlook
13. ### Towards Hamiltonian fl ows O tt o gradient fl ow

∂t ρt = div (ρt ∇ℰ′ ￼ [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 dXt = − ∇ℰ′ ￼ [ρt ](Xt ) dt Lagrangian dynamics · xt = − ∇ℰ(xt ) ρt = 𝖫 𝖺 𝗐 Xt
14. ### dXt = Vt dt dVt = − Vt dt −

eβt ∇ℰ′ ￼ [ρt ](Xt ) dt · xt = vt · vt = − vt − eβt ∇ℰ(xt ) Towards Hamiltonian fl ows O tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′ ￼ [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Lagrangian dynamics ρt = 𝖫 𝖺 𝗐 Xt Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Eg. Ambrosio-Gangbo 2007 Kine ti c Vlasov
15. ### Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 jt (dx) = ∫ v μt (dxdv) ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t mt + divx ( 1 ρt mt ⊗ mt) = − mt − ρt Gt [ρt ] 1) Restricted class of ℰ Eg. Carillo-Choi-Zatorska 2016 Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x) Pressureless Euler
16. ### Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ut + ut ⋅ divx ut = − ut − Gt [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Pressureless Euler jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x)
17. ### Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺

𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Eg. Chow-Li-Zhou 2020, Wang-Li 2022 Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ψt + |∇x ψt |2 = − ψt − eβt ℰ′ ￼ [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Hamilton-Jacobi ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x) jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt
18. ### The Agenda II. Convergence results IV. Numerical experiments Convergence for

Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) I. Hamiltonian fl IV. Summary + Outlook
19. ### Convergence on ℝd Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016,

Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Idea: If for all , then ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) < + ∞ t ≥ τ eβt(ℰ(xt ) − ℰ(x* )) ≤ ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) ∀ t ≥ τ ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Result: · xt = vt · vt = − vt − eβt ∇ℰ(xt )
20. ### Towards 𝒫 2 (ℝd) Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan

2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
21. ### = 1 2 |x − x* |2 + Lyapunov func

ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) ⟨x − x* , v⟩ + 1 2 |v|2 Towards 𝒫 2 (ℝd) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
22. ### = 1 2 |x − x* |2 + Lyapunov func

ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) 1 2 d dt |x − x* |2 + 1 2 |v|2 Towards 𝒫 2 (ℝd) 𝕎 2 2 (ρ, ρ* ) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )
23. ### Lyapunov func ti on: ℒt (μ) := 1 2 𝕎

2 2 (ρ, ρ* ) + 1 2 d dt 𝕎 2 2 (ρ, ρ* ) + ∬ |v|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Claim: = 1 2 ∬ |x + v − 𝖳 (x)|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ≥ 0 Towards 𝒫 2 (ℝd) μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Ques ti on: d dt ℒt (μt ) ≤ 0 ?? dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′ ￼ [ρt ](Xt ) dt
24. ### Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫

𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 1st Order Calculus ∂t ρt + divx jt = 0 1 2 d dt 𝕎 2 2 (ρt , σ) = ∬ ⟨x − y, djt dρt (x)⟩ πt (dxdy) For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) = ∫ ⟨x − 𝖳 t (x), jt (dx)⟩ = ∬ ⟨x − 𝖳 t (x), v⟩ μt (dxdv) Eg. Ambrosio-Gigli-Savaré 2005 jt (dx) = ∫ v μt (dxdv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x)
25. ### Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫

𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 2nd Order Calculus ∂t ρt + divx jt = 0 For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) 1 2 d+ dt d dt 𝕎 2 2 (ρt , σ) ≤ ∬ |v|2 μt (dxdv) − ∬ ⟨x − 𝖳 t (x), v + Gt [ρt ](x)⟩ μt (dxdv) Eg. Carrillo-Choi-Tse 2018, Chen-Li-Tse-Wright (arXiv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′ ￼ [ρ](x)
26. ### Some -calculus 𝕎 2 ℰ[σϑ ] ≤ (1 − ϑ)ℰ[σ0

] + ϑℰ[σ1 ] − λ 2 ϑ(1 − ϑ) 𝕎 2 2 (σ0 , σ1 ) Recall: is -convex if for every geodesic ℰ λ σ: [0,1] → 𝒫 2 (ℝd) ℰ[ρt ] − ℰ[σ] − ∫ ⟨∇ℰ′ ￼ [ρt ](x), x − 𝖳 t (x)⟩ ρt (dx) + λ 2 𝕎 2 2 (ρt , σ) ≤ 0 In par ti cular, for any : σ ∈ 𝒫 (ℝd)
27. ### Convergence on 𝒫 2 (ℝd) Proof: Claim: d+ dt ℒt

(μt ) = · βt eβt(ℰ[ρt ] − ℰ[ρ* ]) −eβt ∬ ⟨x − 𝖳 t (x), ∇ℰ′ ￼ [ρt ](x)⟩ ρt (dx) ≤ eβt [ℰ[ρt ] − ℰ[ρ* ] − ∬ ⟨x − 𝖳 t (x), ∇ℰ′ ￼ [ρt ](x)⟩ ρt (dx)] ≤ 0 Conclude as in the case. ℝd The temporal deriva ti ve of yields ℒt (μt ) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′ ￼ [ρt ](Xt ) dt
28. ### The Agenda II. Convergence results IV. Numerical experiments Convergence for

Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook
29. ### Experimental setup Gradient fl ow (GF) dXt = − ∇ℰ′

￼ [ρt ](Xt ) dt Nesterov (Nes) αt = log(2/t), βt = 2 log(t/2) Exponen ti al (Exp) αt = 0, βt = t dXt = eαt Vt dt dVt = − eαt(Vt + eβt ∇ℰ′ ￼ [ρt ](Xt )) dt dXt = e−γtVt dt dVt = − eγt ∇ℰ′ ￼ [ρt ](Xt ) dt Varia ti onal accelera ti on Heavy-ball fl ow (HBF) γ = 1/2
30. ### Experiment A1 0 2 4 6 8 10 t 10°2

10°1 100 101 102 E(Ωt ) Exp 0 20 40 60 t 10°3 10°2 10°1 100 101 102 103 104 105 E(Ωt ) ° E§ GF HB Nes Exp ℰ[ρ] = ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 500 b ∼ Normal(0, 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1])
31. ### Experiment A2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx)

+ ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 20 b ∼ Normal(0, 10 ⋅ 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1]) N = 1600 0 5 10 15 20 25 t 10°4 10°2 100 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 102 103 E(Ωt ) ° E§ GF HB Nes Exp
32. ### 0 20 40 60 t 1.02 £ 102 1.03 £

102 1.04 £ 102 1.05 £ 102 1.06 £ 102 E(Ωt ) Nes Exp 0 20 40 60 t 10°5 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B1 ℰ[ρ] = ∫ VB (x) ρ(dx) VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) Dimension, d = 50 qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )
33. ### 0 5 10 15 20 25 t 3.25 £ 101

3.3 £ 101 3.35 £ 101 3.4 £ 101 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx) + ∫ VB (x) ρ(dx) Dimension, d = 10 N = 1600 VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )
34. ### 0 5 10 15 20 25 t 10°3 10°2 10°1

100 E(Ωt ) GF HB Nes Exp 1.0 Experiment C1 ℰ[ρ] = 1 2 500 ∑ j=1 | f(xj ) − g[ρ](xj )|2 g[ρ](x) = ∫ [α ReLU(w ⋅ x + b) + β] ρ(dz) Dimension, d = 4 z = (α, β, w, b) ∈ ℝ4 f(x) = sin(πx) xj ∼ Unif([−1,1]) N = 100
35. ### The Agenda II. Convergence results IV. Numerical experiments Convergence for

Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook
36. ### Summary Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt)

λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Varia ti onal accelera ti on · xt = vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′ ￼ [ρt ](Xt ) dt
37. ### Outlook Varia ti onal accelera ti on · xt =

vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′ ￼ [ρt ](Xt ) dt Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Discrete- ti me convergence Sampling ℰ[ρ] = KL(ρ|π) No well-posedness! dXt = Vt dt dVt = − γVt dt − ∇log π(Xt ) dt + 2σ dBt Thank you! Heavy-ball fl ow Eg. Bolley-Guillin-Malrieu 2010, Klar-Kreusser-Tse (2017)