ͱతม ͕ఆ·Εɺ Λղ͘͜ͱʹؼணɻa,b Λ , ͷΑ͏ʹબɺิͰࣔͨ͠ܗ͕ಘΒΕΔɻ A full rank Ͱ͋ΔͨΊɺઢܗํఔࣜΛղ͘͜ͱͰॏΈ w ͕ٻ·Δɻ [ূ໌ऴ] xn , the n ⇥ n matrix A = [max {xi bj , 0 } ]ij has full rank. Its smallest eigenvalue is mini xi bi . Proof. By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A basic linear algebra fact states that a lower-triangular matrix has full rank if and only if all of the entries on the diagional are nonzero. Since, xi > bi , we have that max {xi bi , 0 } > 0 . Hence, A is invertible. The second claim follows directly from the fact that a lower-triangular matrix has all its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I can have lower rank only if equals one of the diagonal values. Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c : Rn ! R, c ( x ) = X j =1 wj max {ha, xi bj , 0 } It is easy to see that c can be expressed by a depth 2 network with ReLU activations. Now, fix a sample S = {z 1 , . . . , zn } of size n and a target vector y 2 Rn. To prove the theorem, we need to find weights a, b, w so that yi = c ( zi) for all i 2 { 1 , . . . , n} First, choose a and b such that with xi = ha, zi i we have the interleaving property b 1 < x 1 < b 2 < · · · < bn < xn . This is possible since all zi ’s are distinct. Next, consider the set of n equations in the n unknowns w, yi = c ( zi) , i 2 { 1 , . . . , n} . n i j ij i i i Proof. By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A basic linear algebra fact states that a lower-triangular matrix has full rank if and only if all of the entries on the diagional are nonzero. Since, xi > bi , we have that max {xi bi , 0 } > 0 . Hence, A is invertible. The second claim follows directly from the fact that a lower-triangular matrix has all its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I can have lower rank only if equals one of the diagonal values. Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c : Rn ! R, c ( x ) = X j =1 wj max {ha, xi bj , 0 } It is easy to see that c can be expressed by a depth 2 network with ReLU activations. Now, fix a sample S = {z 1 , . . . , zn } of size n and a target vector y 2 Rn. To prove the theorem, we need to find weights a, b, w so that yi = c ( zi) for all i 2 { 1 , . . . , n} First, choose a and b such that with xi = ha, zi i we have the interleaving property b 1 < x 1 < b 2 < · · · < bn < xn . This is possible since all zi ’s are distinct. Next, consider the set of n equations in the n unknowns w, yi = c ( zi) , i 2 { 1 , . . . , n} . Proof. By its definition, the matrix A is lower tri basic linear algebra fact states that a lower-triang entries on the diagional are nonzero. Since, xi > is invertible. The second claim follows directly fr its eigenvalues on the main diagonal. This in turn lower rank only if equals one of the diagonal val Proof of Theorem 1. For weight vectors w, b 2 Rn c ( x ) = X j =1 wj ma It is easy to see that c can be expressed by a depth Now, fix a sample S = {z 1 , . . . , zn } of size n and need to find weights a, b, w so that yi = c ( zi) for a First, choose a and b such that with xi = ha, zi i w · · · < bn < xn . This is possible since all zi ’s are the n unknowns w, yi = c ( zi) , i entries on the diagional are nonzero. Since, xi > bi , we have that max {xi bi , 0 } > 0 . He is invertible. The second claim follows directly from the fact that a lower-triangular matrix its eigenvalues on the main diagonal. This in turn follows from the first fact, since A I ca lower rank only if equals one of the diagonal values. Proof of Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c : Rn c ( x ) = X j =1 wj max {ha, xi bj , 0 } It is easy to see that c can be expressed by a depth 2 network with ReLU activations. Now, fix a sample S = {z 1 , . . . , zn } of size n and a target vector y 2 Rn. To prove the theore need to find weights a, b, w so that yi = c ( zi) for all i 2 { 1 , . . . , n} First, choose a and b such that with xi = ha, zi i we have the interleaving property b 1 < x 1 < · · · < bn < xn . This is possible since all zi ’s are distinct. Next, consider the set of n equat the n unknowns w, yi = c ( zi) , i 2 { 1 , . . . , n} . i j ij i i i By its definition, the matrix A is lower triangular, that is, all entries with i < j vanish. A near algebra fact states that a lower-triangular matrix has full rank if and only if all of the on the diagional are nonzero. Since, xi > bi , we have that max {xi bi , 0 } > 0 . Hence, A tible. The second claim follows directly from the fact that a lower-triangular matrix has all nvalues on the main diagonal. This in turn follows from the first fact, since A I can have ank only if equals one of the diagonal values. f Theorem 1. For weight vectors w, b 2 Rn and a 2 Rd, consider the function c : Rn ! R, c ( x ) = X j =1 wj max {ha, xi bj , 0 } sy to see that c can be expressed by a depth 2 network with ReLU activations. x a sample S = {z 1 , . . . , zn } of size n and a target vector y 2 Rn. To prove the theorem, we find weights a, b, w so that yi = c ( zi) for all i 2 { 1 , . . . , n} hoose a and b such that with xi = ha, zi i we have the interleaving property b 1 < x 1 < b 2 < bn < xn . This is possible since all zi ’s are distinct. Next, consider the set of n equations in nknowns w, yi = c ( zi) , i 2 { 1 , . . . , n} . We have c ( zi) = Aw, where A = [max {xi bi , 0 } ]ij is the matrix we encoun We chose a and b so that the lemma applies and hence A has full rank. We can n system y = Aw to find suitable weights w. While the construction in the previous proof has inevitably high width given tha possible to trade width for depth. The construction is as follows. With the nota and assuming w.l.o.g. that x 1 , . . . , xn 2 [0 , 1] , partition the interval [0 , 1] into 5/18