Slide 5
Slide 5 text
ఆཧͷূ໌ʢ̎ʣ
[ূ໌]
ͳΔؔΛߟ͑Δʢ ɹɹɹɹɹɹɹʣ
͜Ε ReLU Λͬͨೋͷ NN Ͱ࣮ݱͰ͖Δɻ
αϯϓϧ ͱతม ͕ఆ·Εɺ
Λղ͘͜ͱʹؼணɻa,b Λ ,
ͷΑ͏ʹબɺิͰࣔͨ͠ܗ͕ಘΒΕΔɻ
A full rank Ͱ͋ΔͨΊɺઢܗํఔࣜΛղ͘͜ͱͰॏΈ w ͕ٻ·Δɻ
[ূ໌ऴ]
xn
, the n ⇥ n matrix A
= [max
{xi
bj
,
0
}
]ij
has full rank. Its smallest eigenvalue is
mini
xi
bi
.
Proof. By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A
basic linear algebra fact states that a lower-triangular matrix has full rank if and only if all of the
entries on the diagional are nonzero. Since, xi
> bi
, we have that
max
{xi
bi
,
0
} >
0
. Hence, A
is invertible. The second claim follows directly from the fact that a lower-triangular matrix has all
its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I can have
lower rank only if equals one of the diagonal values.
Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c
:
Rn ! R,
c
(
x
) =
X
j
=1
wj max
{ha, xi bj
,
0
}
It is easy to see that c can be expressed by a depth
2
network with ReLU activations.
Now, fix a sample S
=
{z
1
, . . . , zn
} of size n and a target vector y 2 Rn. To prove the theorem, we
need to find weights a, b, w so that yi =
c
(
zi)
for all i 2 {
1
, . . . , n}
First, choose a and b such that with xi =
ha, zi
i we have the interleaving property b
1
< x
1
< b
2
<
· · · < bn
< xn
. This is possible since all zi
’s are distinct. Next, consider the set of n equations in
the n unknowns w,
yi =
c
(
zi)
, i 2 {
1
, . . . , n} .
n i j ij i i i
Proof. By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A
basic linear algebra fact states that a lower-triangular matrix has full rank if and only if all of the
entries on the diagional are nonzero. Since, xi
> bi
, we have that
max
{xi
bi
,
0
} >
0
. Hence, A
is invertible. The second claim follows directly from the fact that a lower-triangular matrix has all
its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I can have
lower rank only if equals one of the diagonal values.
Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c
:
Rn ! R,
c
(
x
) =
X
j
=1
wj max
{ha, xi bj
,
0
}
It is easy to see that c can be expressed by a depth
2
network with ReLU activations.
Now, fix a sample S
=
{z
1
, . . . , zn
} of size n and a target vector y 2 Rn. To prove the theorem, we
need to find weights a, b, w so that yi =
c
(
zi)
for all i 2 {
1
, . . . , n}
First, choose a and b such that with xi =
ha, zi
i we have the interleaving property b
1
< x
1
< b
2
<
· · · < bn
< xn
. This is possible since all zi
’s are distinct. Next, consider the set of n equations in
the n unknowns w,
yi =
c
(
zi)
, i 2 {
1
, . . . , n} .
Proof. By its definition, the matrix A is lower tri
basic linear algebra fact states that a lower-triang
entries on the diagional are nonzero. Since, xi
>
is invertible. The second claim follows directly fr
its eigenvalues on the main diagonal. This in turn
lower rank only if equals one of the diagonal val
Proof of Theorem 1. For weight vectors w, b 2 Rn
c
(
x
) =
X
j
=1
wj ma
It is easy to see that c can be expressed by a depth
Now, fix a sample S
=
{z
1
, . . . , zn
} of size n and
need to find weights a, b, w so that yi =
c
(
zi)
for a
First, choose a and b such that with xi =
ha, zi
i w
· · · < bn
< xn
. This is possible since all zi
’s are
the n unknowns w,
yi =
c
(
zi)
, i
entries on the diagional are nonzero. Since, xi
> bi
, we have that
max
{xi
bi
,
0
} >
0
. He
is invertible. The second claim follows directly from the fact that a lower-triangular matrix
its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I ca
lower rank only if equals one of the diagonal values.
Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c
:
Rn
c
(
x
) =
X
j
=1
wj max
{ha, xi bj
,
0
}
It is easy to see that c can be expressed by a depth
2
network with ReLU activations.
Now, fix a sample S
=
{z
1
, . . . , zn
} of size n and a target vector y 2 Rn. To prove the theore
need to find weights a, b, w so that yi =
c
(
zi)
for all i 2 {
1
, . . . , n}
First, choose a and b such that with xi =
ha, zi
i we have the interleaving property b
1
< x
1
<
· · · < bn
< xn
. This is possible since all zi
’s are distinct. Next, consider the set of n equat
the n unknowns w,
yi =
c
(
zi)
, i 2 {
1
, . . . , n} .
i j ij i i i
By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A
near algebra fact states that a lower-triangular matrix has full rank if and only if all of the
on the diagional are nonzero. Since, xi
> bi
, we have that
max
{xi
bi
,
0
} >
0
. Hence, A
tible. The second claim follows directly from the fact that a lower-triangular matrix has all
nvalues on the main diagonal. This in turn follows from the first fact, since A I can have
ank only if equals one of the diagonal values.
f Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c
:
Rn ! R,
c
(
x
) =
X
j
=1
wj max
{ha, xi bj
,
0
}
sy to see that c can be expressed by a depth
2
network with ReLU activations.
x a sample S
=
{z
1
, . . . , zn
} of size n and a target vector y 2 Rn. To prove the theorem, we
find weights a, b, w so that yi =
c
(
zi)
for all i 2 {
1
, . . . , n}
hoose a and b such that with xi =
ha, zi
i we have the interleaving property b
1
< x
1
< b
2
<
bn
< xn
. This is possible since all zi
’s are distinct. Next, consider the set of n equations in
nknowns w,
yi =
c
(
zi)
, i 2 {
1
, . . . , n} .
We have c
(
zi) =
Aw, where A
= [max
{xi
bi
,
0
}
]ij
is the matrix we encoun
We chose a and b so that the lemma applies and hence A has full rank. We can n
system y
=
Aw to find suitable weights w.
While the construction in the previous proof has inevitably high width given tha
possible to trade width for depth. The construction is as follows. With the nota
and assuming w.l.o.g. that x
1
, . . . , xn
2
[0
,
1]
, partition the interval
[0
,
1]
into
5/18