Slide 1

Slide 1 text

Accelera ti ng Op ti miza ti on over Probability Measures Oliver Tse Op ti mal Transport from Theory to Applica ti ons Humboldt Universität, Berlin Joint work with Shi Chen, Qin Li, Stephen J. Wright

Slide 2

Slide 2 text

The Context Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Gradient descent · xt = − ∇ℰ(xt ) Possible Approaches 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 { ℰ(xt ) − ℰ(x* ) ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ(xt ) − ℰ(x* ) ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 λ ≪ 1

Slide 3

Slide 3 text

The Context Possible Approaches Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = e−γtvt , · vt = − eγt ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. Polyak 1964, A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 4

Slide 4 text

The Context Possible Approaches Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} Heavy ball method · xt = vt , · vt = − γvt − ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean { ℰ(xt ) − ℰ(x* ) ≤ o(t−1) 𝗂 𝖿 λ = 0, γ > 0, ℰ(xt ) − ℰ(x* ) ≤ O(e− λt) 𝗂 𝖿 λ > 0, γ = 2 λ Eg. A tt 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 5

Slide 5 text

The Context Possible Approaches Nesterov method · xt = 2 t vt , · vt = − 2 t vt − t 2 ∇ℰ(xt ) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(t−2) 𝗂𝖿 λ = 0 Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 6

Slide 6 text

The Context Possible Approaches Varia ti onal accelera ti on · xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt )) 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Nesterov method αt = log(2/t), βt = 2 log(t/2) Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 7

Slide 7 text

The Context Possible Approaches Varia ti onal accelera ti on 𝒳 = ℝd, 𝖽 = Euclidean ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ eαt Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = eαt vt , · vt = − eαt(vt + eβt ∇ℰ(xt ))

Slide 8

Slide 8 text

The Context Possible Approaches Varia ti onal accelera ti on 𝒳 = ℝd, 𝖽 = Euclidean Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Exponen ti al method αt = 0, βt = t Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳 · xt = vt , · vt = − vt − eβt ∇ℰ(xt ) · βt ≤ 1 ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0,

Slide 9

Slide 9 text

The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) O tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 { ℰ[ρt ] − ℰ[ρ* ] ≤ O(t−1) 𝗂 𝖿 λ = 0, ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−2λt) 𝗂 𝖿 λ > 0, Eg. Ambrosio-Gigli-Savaré 2005 Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 10

Slide 10 text

The Context 𝖥𝗂𝗇 𝖽 x* ∈ 𝖺 𝗋𝗀𝗆𝗂𝗇 ℰ(x) Heavy ball? Possible Approaches 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Nesterov? Varia ti onal accelera ti on? Task: Given a -convex objec ti ve λ ℰ : ( 𝒳 , d) → ℝ ∪ {+∞} ℰ(xϑ ) ≤ (1 − ϑ)ℰ(x0 ) + ϑℰ(x1 ) − λ 2 ϑ(1 − ϑ) 𝖽 2(x0 , x1 ) Recall: is -convex if for every geodesic ℰ λ x: [0,1] → 𝒳

Slide 11

Slide 11 text

II. Convergence results The Agenda I. Hamiltonian fl ows IV. Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) IV. Summary + Outlook

Slide 12

Slide 12 text

The Agenda II. Convergence results IV. Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl ows IV. Summary + Outlook

Slide 13

Slide 13 text

Towards Hamiltonian fl ows O tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 dXt = − ∇ℰ′  [ρt ](Xt ) dt Lagrangian dynamics · xt = − ∇ℰ(xt ) ρt = 𝖫 𝖺 𝗐 Xt

Slide 14

Slide 14 text

dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′  [ρt ](Xt ) dt · xt = vt · vt = − vt − eβt ∇ℰ(xt ) Towards Hamiltonian fl ows O tt o gradient fl ow ∂t ρt = div (ρt ∇ℰ′  [ρt ]) Se tti ng: 𝒳 = 𝒫 2 (ℝd), 𝖽 = 𝕎 2 Lagrangian dynamics ρt = 𝖫 𝖺 𝗐 Xt Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Eg. Ambrosio-Gangbo 2007 Kine ti c Vlasov

Slide 15

Slide 15 text

Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 jt (dx) = ∫ v μt (dxdv) ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t mt + divx ( 1 ρt mt ⊗ mt) = − mt − ρt Gt [ρt ] 1) Restricted class of ℰ Eg. Carillo-Choi-Zatorska 2016 Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) Pressureless Euler

Slide 16

Slide 16 text

Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ut + ut ⋅ divx ut = − ut − Gt [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Pressureless Euler jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)

Slide 17

Slide 17 text

Hamiltonian fl ows Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Eg. Chow-Li-Zhou 2020, Wang-Li 2022 Remarks: 2) solves 3) solves ρt = 𝖫 𝖺 𝗐 Xt mt ∂t ρt + divx jt = 0 ∂t ψt + |∇x ψt |2 = − ψt − eβt ℰ′  [ρt ] 1) Restricted class of ℰ ∂t mt + divx 𝕋 t = − mt − ρt Gt [ρt ] Hamilton-Jacobi ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x) jt (dx) = ∫ v μt (dxdv) ≈ 1 ρt mt ⊗ mt 𝕋 t (dx) = ∫ v ⊗ v μt (dxdv) =: mt

Slide 18

Slide 18 text

The Agenda II. Convergence results IV. Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 2 𝒫 2 (ℝd) I. Hamiltonian fl IV. Summary + Outlook

Slide 19

Slide 19 text

Convergence on ℝd Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Idea: If for all , then ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) < + ∞ t ≥ τ eβt(ℰ(xt ) − ℰ(x* )) ≤ ℒt (xt , vt ) ≤ ℒτ (xτ , vτ ) ∀ t ≥ τ ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Result: · xt = vt · vt = − vt − eβt ∇ℰ(xt )

Slide 20

Slide 20 text

Towards 𝒫 2 (ℝd) Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )

Slide 21

Slide 21 text

= 1 2 |x − x* |2 + Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) ⟨x − x* , v⟩ + 1 2 |v|2 Towards 𝒫 2 (ℝd) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )

Slide 22

Slide 22 text

= 1 2 |x − x* |2 + Lyapunov func ti on: Eg. Wibisono-Wilson-Jordan 2016, Wilson-Recht-Jordan 2021 ℒt (x, v) := 1 2 |x + v − x* |2 + eβt(ℰ(x) − ℰ(x* )) Result: + eβt(ℰ(x) − ℰ(x* )) 1 2 d dt |x − x* |2 + 1 2 |v|2 Towards 𝒫 2 (ℝd) 𝕎 2 2 (ρ, ρ* ) ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 · xt = vt · vt = − vt − eβt ∇ℰ(xt )

Slide 23

Slide 23 text

Lyapunov func ti on: ℒt (μ) := 1 2 𝕎 2 2 (ρ, ρ* ) + 1 2 d dt 𝕎 2 2 (ρ, ρ* ) + ∬ |v|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 Claim: = 1 2 ∬ |x + v − 𝖳 (x)|2 μ(dxdv) + eβt(ℰ[ρ] − ℰ[ρ* ]) ≥ 0 Towards 𝒫 2 (ℝd) μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt Ques ti on: d dt ℒt (μt ) ≤ 0 ?? dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′  [ρt ](Xt ) dt

Slide 24

Slide 24 text

Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 1st Order Calculus ∂t ρt + divx jt = 0 1 2 d dt 𝕎 2 2 (ρt , σ) = ∬ ⟨x − y, djt dρt (x)⟩ πt (dxdy) For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) = ∫ ⟨x − 𝖳 t (x), jt (dx)⟩ = ∬ ⟨x − 𝖳 t (x), v⟩ μt (dxdv) Eg. Ambrosio-Gigli-Savaré 2005 jt (dx) = ∫ v μt (dxdv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)

Slide 25

Slide 25 text

Some -calculus 𝕎 2 Hamiltonian fl ow μt = 𝖫 𝖺 𝗐 (Xt , Vt ), ρt = 𝖫 𝖺 𝗐 Xt 2nd Order Calculus ∂t ρt + divx jt = 0 For any , : σ ∈ 𝒫 (ℝd) πt ∈ Π(ρt , σ) 1 2 d+ dt d dt 𝕎 2 2 (ρt , σ) ≤ ∬ |v|2 μt (dxdv) − ∬ ⟨x − 𝖳 t (x), v + Gt [ρt ](x)⟩ μt (dxdv) Eg. Carrillo-Choi-Tse 2018, Chen-Li-Tse-Wright (arXiv) ∂t μt + divx (μt v) = divv (μt (v + Gt [ρt ])) Gt [ρ](x) = eβt ∇ℰ′  [ρ](x)

Slide 26

Slide 26 text

Some -calculus 𝕎 2 ℰ[σϑ ] ≤ (1 − ϑ)ℰ[σ0 ] + ϑℰ[σ1 ] − λ 2 ϑ(1 − ϑ) 𝕎 2 2 (σ0 , σ1 ) Recall: is -convex if for every geodesic ℰ λ σ: [0,1] → 𝒫 2 (ℝd) ℰ[ρt ] − ℰ[σ] − ∫ ⟨∇ℰ′  [ρt ](x), x − 𝖳 t (x)⟩ ρt (dx) + λ 2 𝕎 2 2 (ρt , σ) ≤ 0 In par ti cular, for any : σ ∈ 𝒫 (ℝd)

Slide 27

Slide 27 text

Convergence on 𝒫 2 (ℝd) Proof: Claim: d+ dt ℒt (μt ) = · βt eβt(ℰ[ρt ] − ℰ[ρ* ]) −eβt ∬ ⟨x − 𝖳 t (x), ∇ℰ′  [ρt ](x)⟩ ρt (dx) ≤ eβt [ℰ[ρt ] − ℰ[ρ* ] − ∬ ⟨x − 𝖳 t (x), ∇ℰ′  [ρt ](x)⟩ ρt (dx)] ≤ 0 Conclude as in the case. ℝd The temporal deriva ti ve of yields ℒt (μt ) ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) 𝗂 𝖿 λ = 0, · βt ≤ 1 dXt = Vt dt dVt = − Vt dt − eβt ∇ℰ′  [ρt ](Xt ) dt

Slide 28

Slide 28 text

The Agenda II. Convergence results IV. Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook

Slide 29

Slide 29 text

Experimental setup Gradient fl ow (GF) dXt = − ∇ℰ′  [ρt ](Xt ) dt Nesterov (Nes) αt = log(2/t), βt = 2 log(t/2) Exponen ti al (Exp) αt = 0, βt = t dXt = eαt Vt dt dVt = − eαt(Vt + eβt ∇ℰ′  [ρt ](Xt )) dt dXt = e−γtVt dt dVt = − eγt ∇ℰ′  [ρt ](Xt ) dt Varia ti onal accelera ti on Heavy-ball fl ow (HBF) γ = 1/2

Slide 30

Slide 30 text

Experiment A1 0 2 4 6 8 10 t 10°2 10°1 100 101 102 E(Ωt ) Exp 0 20 40 60 t 10°3 10°2 10°1 100 101 102 103 104 105 E(Ωt ) ° E§ GF HB Nes Exp ℰ[ρ] = ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 500 b ∼ Normal(0, 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1])

Slide 31

Slide 31 text

Experiment A2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx) + ∫ VA (x) ρ(dx) VA (x) = 1 2 ⟨x − b, A(x − b)⟩ Dimension, d = 20 b ∼ Normal(0, 10 ⋅ 𝖨 d ) p.d. with Eig(A) A ∼ Unif([0.001,1]) N = 1600 0 5 10 15 20 25 t 10°4 10°2 100 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 102 103 E(Ωt ) ° E§ GF HB Nes Exp

Slide 32

Slide 32 text

0 20 40 60 t 1.02 £ 102 1.03 £ 102 1.04 £ 102 1.05 £ 102 1.06 £ 102 E(Ωt ) Nes Exp 0 20 40 60 t 10°5 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B1 ℰ[ρ] = ∫ VB (x) ρ(dx) VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) Dimension, d = 50 qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )

Slide 33

Slide 33 text

0 5 10 15 20 25 t 3.25 £ 101 3.3 £ 101 3.35 £ 101 3.4 £ 101 E(Ωt ) Exp 0 5 10 15 20 25 t 10°4 10°3 10°2 10°1 100 101 E(Ωt ) ° E§ GF HB Nes Exp Experiment B2 ℰε [ρ] = ∫ log(Kε ⋆ ρ) ρ(dx) + ∫ VB (x) ρ(dx) Dimension, d = 10 N = 1600 VB (x) = 20 log ( 200 ∑ i=1 exp ( ⟨wi , x⟩ − qi 20 )) qi ∼ Normal(0, 1) wi ∼ Normal(0, 𝖨 d )

Slide 34

Slide 34 text

0 5 10 15 20 25 t 10°3 10°2 10°1 100 E(Ωt ) GF HB Nes Exp 1.0 Experiment C1 ℰ[ρ] = 1 2 500 ∑ j=1 | f(xj ) − g[ρ](xj )|2 g[ρ](x) = ∫ [α ReLU(w ⋅ x + b) + β] ρ(dz) Dimension, d = 4 z = (α, β, w, b) ∈ ℝ4 f(x) = sin(πx) xj ∼ Unif([−1,1]) N = 100

Slide 35

Slide 35 text

The Agenda II. Convergence results IV. Numerical experiments Convergence for Some -calculus Convergence for ℝd 𝕎 𝒫 I. Hamiltonian fl IV. Summary + Outlook

Slide 36

Slide 36 text

Summary Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Varia ti onal accelera ti on · xt = vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′  [ρt ](Xt ) dt

Slide 37

Slide 37 text

Outlook Varia ti onal accelera ti on · xt = vt · vt = − vt − eβt ∇ℰ(xt ) dXt = Vt dt dVt = − Vt − eβt ∇ℰ′  [ρt ](Xt ) dt Convergence results ℰ(xt ) − ℰ(x* ) ≤ O(e−βt) λ = 0, · βt ≤ 1 ℰ[ρt ] − ℰ[ρ* ] ≤ O(e−βt) Lyapunov + 2nd order -calculus 𝕎 2 Discrete- ti me convergence Sampling ℰ[ρ] = KL(ρ|π) No well-posedness! dXt = Vt dt dVt = − γVt dt − ∇log π(Xt ) dt + 2σ dBt Thank you! Heavy-ball fl ow Eg. Bolley-Guillin-Malrieu 2010, Klar-Kreusser-Tse (2017)