Gabriel Peyré
September 29, 2023
690

# Conservation Laws for Gradient Flows

Talk associated to the paper: https://arxiv.org/abs/2307.00144

## Gabriel Peyré

September 29, 2023

## Transcript

1. Conservation Laws

Gabriel Peyré
É C O L E N O R M A L E
S U P É R I E U R E
Sibylle

Marcotte
Remi

Gribonval

2. Overview
Conservation

laws
Finding

conservation laws
Have we found them all?
Ferdinand Georg

Frobenius
Sophus

Lie

3. g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Conservation laws
ℰY
X
(θ) :=
1
N
N

i=1
ℓ(g(θ, xi
), yi
)
Neural network (2 layers): θ = (U, V)
σ U
V⊤
x g(θ, x)
Empirical risk minimization:
3

4. g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Conservation laws
ℰY
X
(θ) :=
1
N
N

i=1
ℓ(g(θ, xi
), yi
)
Neural network (2 layers): θ = (U, V)
σ U
V⊤
x g(θ, x)
Empirical risk minimization:
·
θ(t) = − ∇ℰY
X
(θ(t))
θ(0)
θ(t)
−∇ℰ
Y1
X1

Y2
X2
3

5. g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Conservation laws
ℰY
X
(θ) :=
1
N
N

i=1
ℓ(g(θ, xi
), yi
)
Neural network (2 layers): θ = (U, V)
Conservation law :
h(θ)
∀X, Y, t, h(θ(t)) = h(θ(0))
σ U
V⊤
x g(θ, x)
Empirical risk minimization:
·
θ(t) = − ∇ℰY
X
(θ(t))
{θ : h(θ) = h(θ(0))}
θ(0)
θ(t)
−∇ℰ
Y1
X1

Y2
X2
3

6. g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Conservation laws
ℰY
X
(θ) :=
1
N
N

i=1
ℓ(g(θ, xi
), yi
)
Neural network (2 layers): θ = (U, V)
Conservation law :
h(θ)
∀X, Y, t, h(θ(t)) = h(θ(0))
σ U
V⊤
x g(θ, x)
Empirical risk minimization:
·
θ(t) = − ∇ℰY
X
(θ(t))
{θ : h(θ) = h(θ(0))}
θ(0)
θ(t)
−∇ℰ
Y1
X1

Y2
X2
Understanding implicit
Applications:
Helping to prove
convergence.
θ(0)
θ(+∞)
argmin(ℰY
X
)
3

7. Independent conservation laws
hk,k′

(U, V) = ⟨uk
, uk′

⟩ − ⟨vk
, vk′

Linear networks ReLu networks
σ(s) = max(s,0)
σ(s) = s
hk
(U, V) = ∥uk
∥2 − ∥vk
∥2
g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Example: θ = (U, V)
σ
σ
4

8. Independent conservation laws
hk,k′

(U, V) = ⟨uk
, uk′

⟩ − ⟨vk
, vk′

Linear networks ReLu networks
σ(s) = max(s,0)
σ(s) = s
hk
(U, V) = ∥uk
∥2 − ∥vk
∥2
g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Example: θ = (U, V)
σ
σ
4
ℰY
X
(u, v) = (uvx − y)2
1 neuron in 1-D:
uvx = y
u2 − v2 = u2
0
− v2
0
θ(0)
θ(+∞)

9. Independent conservation laws
hk,k′

(U, V) = ⟨uk
, uk′

⟩ − ⟨vk
, vk′

Linear networks ReLu networks
σ(s) = max(s,0)
σ(s) = s
hk
(U, V) = ∥uk
∥2 − ∥vk
∥2
How many? Determine them?
(h1
, …, hK
) conserved ⟹ Φ(h1
, …, hK
) conserved
Independence: ∀θ, (∇h1
(θ), …, ∇hK
(θ)) are independent
g(θ, x) := Uσ(V⊤x) = ∑
k
uk
σ(⟨x, vk
⟩)
Example: θ = (U, V)
σ
σ
4
ℰY
X
(u, v) = (uvx − y)2
1 neuron in 1-D:
uvx = y
u2 − v2 = u2
0
− v2
0
θ(0)
θ(+∞)

10. Overview
Conservation

laws
Finding

conservation laws
Have we found them all?
Ferdinand Georg

Frobenius
Sophus

Lie

11. Structure of the Flow Fields
·
θ(t) = w(θ(t)) where w(θ) ∈ W(θ)
W(θ) := Span {∇ℰY
X
(θ) : ∀X, Y}
W
(θ)
θ(t)
Flow fields:
6

12. {θ : h(θ) = h(θ(0))}
Structure of the Flow Fields
·
θ(t) = w(θ(t)) where w(θ) ∈ W(θ)
W(θ) := Span {∇ℰY
X
(θ) : ∀X, Y}
Proposition: h conserved ⇔ ∀θ, ∇h(θ) ⊥ W(θ)
W
(θ)
θ(t)
Flow fields:
∇h(θ)
6

13. {θ : h(θ) = h(θ(0))}
Structure of the Flow Fields
·
θ(t) = w(θ(t)) where w(θ) ∈ W(θ)
W(θ) := Span {∇ℰY
X
(θ) : ∀X, Y}
Proposition: h conserved ⇔ ∀θ, ∇h(θ) ⊥ W(θ)
W
(θ)
θ(t)
Flow fields:
∇h(θ)
Span
y
∇ℓ(z, y) = whole space.
Hypothesis: ℓ(z, y) = ∥z − y∥2
Example:
Proposition: W(θ) = Span⋃
x
Im[∂θ
g(θ, x)⊤]
Question: determining W(θ)
∇ℰY
X
(θ) =
1
N
N

i=1
∂θ
g(θ, xi
)⊤αi
where αi
= ∇ℓ(g(θ, xi
), yi
)
Chain rule:
6

14. Minimal Parameterizations
Re-parameterization: g(θ, x) = f(φ(θ), x)
should "factor" the invariances.
φ(θ)
Linear networks:
g(θ, x) = UV⊤x
φ(U, V) = UV⊤
ReLu networks:
g(θ, x) = ∑
i
ui
ReLu(⟨vi
, x⟩)
= ∑
i
1⟨vi
,x⟩≥0
(ui
v⊤
i
)x
φ(U, V) = (ui
v⊤
i
)i
(valid only locally)
7

15. Minimal Parameterizations
Re-parameterization: g(θ, x) = f(φ(θ), x)
should "factor" the invariances.
φ(θ)
Linear networks:
g(θ, x) = UV⊤x
φ(U, V) = UV⊤
ReLu networks:
g(θ, x) = ∑
i
ui
ReLu(⟨vi
, x⟩)
= ∑
i
1⟨vi
,x⟩≥0
(ui
v⊤
i
)x
φ(U, V) = (ui
v⊤
i
)i
(valid only locally)
7
W(θ) = Wg
(θ) := Span⋃x
Im[∂θ
g(θ, x)⊤]
∂θ
g(θ, x)⊤ = ∂φ(θ)⊤∂f(φ(θ), x)⊤
Chain rule:
= ∂φ(θ)⊤ Span⋃x
Im[∂θ
f(θ, x)⊤]
:= Wf
(θ)

16. Minimal Parameterizations
Re-parameterization: g(θ, x) = f(φ(θ), x)
should "factor" the invariances.
φ(θ)
Linear networks:
g(θ, x) = UV⊤x
φ(U, V) = UV⊤
ReLu networks:
g(θ, x) = ∑
i
ui
ReLu(⟨vi
, x⟩)
= ∑
i
1⟨vi
,x⟩≥0
(ui
v⊤
i
)x
φ(U, V) = (ui
v⊤
i
)i
(valid only locally)
7
W(θ) = Wg
(θ) := Span⋃x
Im[∂θ
g(θ, x)⊤]
∂θ
g(θ, x)⊤ = ∂φ(θ)⊤∂f(φ(θ), x)⊤
Chain rule:
= ∂φ(θ)⊤ Span⋃x
Im[∂θ
f(θ, x)⊤]
:= Wf
(θ)
⟺ W(θ) = Span(∂φ(θ)⊤)
Finite dimensional set of vector fields

Definition: is minimal if is the whole space
φ Wf
(θ)

17. Minimal Parameterizations
Re-parameterization: g(θ, x) = f(φ(θ), x)
should "factor" the invariances.
φ(θ)
Linear networks:
g(θ, x) = UV⊤x
φ(U, V) = UV⊤
ReLu networks:
g(θ, x) = ∑
i
ui
ReLu(⟨vi
, x⟩)
= ∑
i
1⟨vi
,x⟩≥0
(ui
v⊤
i
)x
φ(U, V) = (ui
v⊤
i
)i
Theorem:
σ = Id, φ(U, V) = UV
For
σ = ReLu, φ(U, V) = (ui
v⊤
i
)i
are minimal
(outside a set of 0 measure for ReLu)
(valid only locally)
7
W(θ) = Wg
(θ) := Span⋃x
Im[∂θ
g(θ, x)⊤]
∂θ
g(θ, x)⊤ = ∂φ(θ)⊤∂f(φ(θ), x)⊤
Chain rule:
= ∂φ(θ)⊤ Span⋃x
Im[∂θ
f(θ, x)⊤]
:= Wf
(θ)
⟺ W(θ) = Span(∂φ(θ)⊤)
Finite dimensional set of vector fields

Definition: is minimal if is the whole space
φ Wf
(θ)

18. Constructing Conservation Laws
Consequence:
W(θ) = Span(∂φ(θ)⊤)
h conserved ⇔ ∂φ(θ)∇h(θ) = 0
Minimal parameterization :
φ
W
(θ)
θ(t)
∇h(θ)
{θ : h(θ) = h(θ(0))}
8

19. Constructing Conservation Laws
Consequence:
W(θ) = Span(∂φ(θ)⊤)
h conserved ⇔ ∂φ(θ)∇h(θ) = 0
Minimal parameterization :
φ
W
(θ)
θ(t)
∇h(θ)
{θ : h(θ) = h(θ(0))}
φ(u, v) = uv⊤
W(u, v) = Span
M
{(Mv, M⊤u)}
∂φ(u, v)⊤ : M ↦ (Mv, M⊤u)
h conserved ⇔ ∀M, ⟨∇u
h(u, v), Mv⟩ + ⟨∇v
h(u, v), M⊤v⟩ = 0
⇔ ∇u
h(u, v)v⊤ + u∇v
h(u, v)⊤ = 0
Example: single neuron
Only solutions: h(u, v) = Φ(∥u∥2 − ∥v∥2)
8

20. Constructing Conservation Laws
Consequence:
W(θ) = Span(∂φ(θ)⊤)
h conserved ⇔ ∂φ(θ)∇h(θ) = 0
Minimal parameterization :
φ
W
(θ)
θ(t)
∇h(θ)
{θ : h(θ) = h(θ(0))}
φ(u, v) = uv⊤
W(u, v) = Span
M
{(Mv, M⊤u)}
∂φ(u, v)⊤ : M ↦ (Mv, M⊤u)
h conserved ⇔ ∀M, ⟨∇u
h(u, v), Mv⟩ + ⟨∇v
h(u, v), M⊤v⟩ = 0
⇔ ∇u
h(u, v)v⊤ + u∇v
h(u, v)⊤ = 0
Example: single neuron
Only solutions: h(u, v) = Φ(∥u∥2 − ∥v∥2)
8
For a polynomial , restricting the

search to fixed degree polynomials :
φ
h
finite dimensional linear kernel.

21. Overview
Conservation

laws
Finding

conservation laws
Have we found them all?
Ferdinand Georg

Frobenius
Sophus

Lie

22. Did we found all the conservation laws?
Question: find a "minimal" surface tangent to all .
Σ W(θ)
Issue: in general, impossible!
dim(Σ) = dim(W(θ))
Σ
W(θ)
θ(t)
10

23. Did we found all the conservation laws?
Question: find a "minimal" surface tangent to all .
Σ W(θ)
Issue: in general, impossible!
dim(Σ) = dim(W(θ))
Definition: Lie brackets
[w1
, w2
](θ) := ∂w1
(θ)w2
(θ) − ∂w2
(θ)w1
(θ)
·
θ =
w2
(θ)
·
θ
=
w
1 (θ)
if
= [w1
, w2
] = 0
Σ
W(θ)
θ(t)
10

24. Did we found all the conservation laws?
Question: find a "minimal" surface tangent to all .
Σ W(θ)
Issue: in general, impossible!
dim(Σ) = dim(W(θ))
Definition: Lie brackets
[w1
, w2
](θ) := ∂w1
(θ)w2
(θ) − ∂w2
(θ)w1
(θ)
·
θ =
w2
(θ)
·
θ
=
w
1 (θ)
if
= [w1
, w2
] = 0
Σ
W(θ)
θ(t)
Theorem:
and ∀(i, j), [wi
, wj
](θ) ∈ W(θ)
then there exists with .
Σ dim(Σ) = dim(W(θ))
If W(θ) = Span(wi
(θ))i
Ferdinand Georg

Frobenius
10

25. Did we found all the conservation laws?
Question: find a "minimal" surface tangent to all .
Σ W(θ)
Issue: in general, impossible!
dim(Σ) = dim(W(θ))
Definition: Lie brackets
[w1
, w2
](θ) := ∂w1
(θ)w2
(θ) − ∂w2
(θ)w1
(θ)
·
θ =
w2
(θ)
·
θ
=
w
1 (θ)
if
= [w1
, w2
] = 0
Σ
W(θ)
θ(t)
[wi
, wj
](θ) ∉ W(θ)
Linear networks
ReLu networks
[wi
, wj
](θ) ∈ W(θ)
Theorem:
and ∀(i, j), [wi
, wj
](θ) ∈ W(θ)
then there exists with .
Σ dim(Σ) = dim(W(θ))
If W(θ) = Span(wi
(θ))i
Ferdinand Georg

Frobenius
10

26. Did we found all the conservation laws?
Question: find a "minimal" surface tangent to all .
Σ W(θ)
Issue: in general, impossible!
dim(Σ) = dim(W(θ))
Definition: Generated Lie algebra :
W∞
W0
(θ) = W(θ) Wk+1
= Span([W0
, Wk
] ⊕ Wk
)
Definition: Lie brackets
[w1
, w2
](θ) := ∂w1
(θ)w2
(θ) − ∂w2
(θ)w1
(θ)
·
θ =
w2
(θ)
·
θ
=
w
1 (θ)
if
= [w1
, w2
] = 0
Σ
W(θ)
θ(t)
[wi
, wj
](θ) ∉ W(θ)
Linear networks
ReLu networks
[wi
, wj
](θ) ∈ W(θ)
Theorem:
and ∀(i, j), [wi
, wj
](θ) ∈ W(θ)
then there exists with .
Σ dim(Σ) = dim(W(θ))
If W(θ) = Span(wi
(θ))i
Ferdinand Georg

Frobenius
Sophus

Lie 10

27. Number of Conservation Laws
Theorem: if is locally constant,
dim(W∞
(θ)) = K
there are exactly independent conservation laws.
d − K
ReLu networks Linear networks
φ(U, V) = UV⊤
φ(U, V) = (ui
v⊤
i
)i
φ(u, v) = uv⊤
separability
11
φ : (U, V) ∈ ℝn×r × ℝm×r ↦ UV⊤ Assuming has full rank .
(U; V) ∈ ℝ(n+m)×r r

28. Number of Conservation Laws
Theorem: if is locally constant,
dim(W∞
(θ)) = K
there are exactly independent conservation laws.
d − K
ReLu networks Linear networks
φ(U, V) = UV⊤
φ(U, V) = (ui
v⊤
i
)i
φ(u, v) = uv⊤
separability
11
φ : (U, V) ∈ ℝn×r × ℝm×r ↦ UV⊤ Assuming has full rank .
(U; V) ∈ ℝ(n+m)×r r
Proposition: given , one has
W0
(θ) = Span(∂φ(θ)⊤)
W0
⊊ W1
= Span([W0
, W0
] ⊕ W0
)
W1
= W2
= Span([W0
, W1
] ⊕ W1
) = W3
= … = W∞
Explicit formula,
dim(V∞
) =
(n + m)r − r(r + 1)/2

29. Number of Conservation Laws
Theorem: if is locally constant,
dim(W∞
(θ)) = K
there are exactly independent conservation laws.
d − K
ReLu networks Linear networks
φ(U, V) = UV⊤
φ(U, V) = (ui
v⊤
i
)i
φ(u, v) = uv⊤
separability
Proposition: hk,k′

(U, V) = ⟨uk
, uk′

⟩ − ⟨vk
, vk′

define independent conservations laws.
r(r + 1)/2
Corollary: for ReLu and linear networks, no other conservation laws.
11
φ : (U, V) ∈ ℝn×r × ℝm×r ↦ UV⊤ Assuming has full rank .
(U; V) ∈ ℝ(n+m)×r r
Proposition: given , one has
W0
(θ) = Span(∂φ(θ)⊤)
W0
⊊ W1
= Span([W0
, W0
] ⊕ W0
)
W1
= W2
= Span([W0
, W1
] ⊕ W1
) = W3
= … = W∞
Explicit formula,
dim(V∞
) =
(n + m)r − r(r + 1)/2

30. Conclusion
Deeper networks: no minimal
parameterization valid for almost all .
θ
For some , there exists

new conservation laws.
→ θ(0)
https://github.com/sibyllema/Conservation_laws
is infinite dimensional,

SageMath code to compute
→ W∞
W∞
(θ)
. . .

31. Conclusion
Deeper networks: no minimal
parameterization valid for almost all .
θ
For some , there exists

new conservation laws.
→ θ(0)
https://github.com/sibyllema/Conservation_laws
is infinite dimensional,

SageMath code to compute
→ W∞
W∞
(θ)