Slide 36
Slide 36 text
min
u1
,v1
,…
F(u1
, v1
, …)
Training
d(ui
, vi
)
dt
= − ∇F(u1
, v1
, …)
:= ∑
i
(
n
∑
k=1
uk
σ(⟨zi
, vk
⟩) − yi)2
Training Dynamics of 2 layers MLP
2 layers perceptron: z ↦ ∑n
k=1
uk
σ(⟨z, uk
⟩)
Theorem: for perceptrons, if has
« enough neurons », can only
converge to a global minimum.
αt=0
αt
« Global » convergence,
despite not being convex.
→
F
Lenaic
Chizat
Francis
Bach
σ (uk
)k
(vk
)k
z
f(α) = ∫ kdα ⊗ α + ∫ hdα
:= ∑
i
(∫
uσ(⟨z, v⟩)dα(u, v) − yi)2
α = ∑
k
δ(uk
,vk
)
∂αt
∂t
− div(∇W
f(αt
) αt
) = 0
min
α
f(α)
α
(uk
, vk
)
k(u, v, u′

, v′

) := ∑
i
uu′

σ(⟨zi
, v⟩)σ(⟨zi
, v′

⟩)