Deep, Skinny Neural Networks are not Universal Approximators

Deep, Skinny Neural Networks are not Universal Approximators by Jesse
Johnson Presented by Breandan Considine

R : k Parameter space R : n Input space
R : m Output space φ : R → R, Activation function n , … , n : 1 κ−1 Hidden layers N : φ,n ∗ Family of skinny nets, i.e. the union of all model families N ∣ φ,n ,n ,...,n ,1 0 1 κ−1 n ≤ i n∀i ∈ [0, κ) N : φ,n ,n ,...,n ,1 0 1 κ−1 Family of functions defined by a FF neural network with n inputs and n outputs 0 κ : N ^ n Union of all non-singular functions in families Nφ,n

For a neural network with a one-dimensional output, if: the
hidden layers are all the same dimension the weight matrices are all non-singular the activation function is one-to-one Then the composition of all the functions up to last linear function will be one-to-one as well. The last linear function is a 1D projection, so the preimage of any point into the last hidden layer is a hyperplane. The preimage of this hyperplane in the one-to-one map may not be a hyperplane, but it will be an unbounded subset of the domain.

Deﬁnition 1. We will say that a function has unbounded
level components if for every , every path component of is unbounded. f : R → R y ∈ R f (y) −1

Level set: intersection between hyperplane and function

Connected path: We can reach any point from every other
point through a continuous path (straight line alone is not a component by itself) Disjoint components Connected path: We can reach any point from every other point

Theorem 1. For any integer and uniformly continuous activation function
that can be approximated by one-to- one functions, the family of layered feed- forward neural networks with input dimension in which each hidden layer has dimension at most cannot approximate any function with a level set containing a bounded path component. n ≥ 2 φ : R → R n n

The proof of Theorem 1 has three parts: 1. If
a function can be approximated by then it can be approximated by the smaller model family of non-singular layers (L2) 2. Therefor it can be approximated by functions with unbounded level components (L4) 3. Therefor it must also have unbounded level components (by contradiction). (Lemma 5) Nφ,n ∗

Deﬁne the model family of non-singular functions to be the
union of all non-singular functions in families for all activation functions and a ﬁxed . Lemma 2. Lemma 2. If is approximated by for some continuous activation function that can be uniformly approximated by one-to-one functions then it is approximated by . N ^ n Nφ,n ∗ φ n g Nφ,n ∗ φ N ^ n

Lemma 4. If is a function in then every level
set is homeomorphic to an open (possibly empty) subset of . This implies that has unbounded level components. f N ^ n f (y) −1 Rn−1 f

Take in a metric space. is a closed, bounded and
compact set in . is a closed and bounded set in , which is not compact (e.g. ). is a closed, but unbounded and not compact set in . is an unbounded set which is neither closed nor compact in . is neither closed nor bounded in , and it's not compact. No unbounded set or not closed set can be compact in any metric space. X = (0, ∞) [1, 2] X (0, 1] X (0, 1] ⊆ (1/n, 2) ⋃ n [1, ∞) X (1, ∞) X (1, 2) X

Proof. Recall by definition is a non-singular function in ,
where is continuous and one-to-one. Let be the function defined by all but the last layer of the network. Let bethe function defined by the map from the last hidden layer to the final output layer so that . Because is nonsingular, the linear functions are all one-to one. Because is continuous and one-to-one, so are all the non-linear functions. Thus the composition is also one-to-one, and therefore a homeomorphism from onto its image . f N ^ n φ : f ^ R → n Rn : f R → n Rn f = ∘ f f ^ f φ f ^ Rn I f ^

Since is homeomorphic to an open n-dimensional ball, is an
open subset of , as indicated in the top row of Figure 1. Rn I f ^ Rn

The function is the composition of a linear function to
with , which is one-to-one by assumption. So the preimage for any is an (n−1)-dimensional plane in . The preimage is the preimage in this (n−1)-dimensional plane, or rather the preimage of the inter-section , as indicated in the bottom right/center of Figure 2. Since is open as asubset of , the intersection is open as a subset of ̄ . f R φ (y) f−1 y ∈ R Rn f (y) −1 f ^ I ∩ f ^ (y) f−1 I f ^ Rn f (y) −1

Since is one-to-one, its restriction to this preimage (shown on
the bottom left of the Figure) is a homeomorphism from to this open subset of the -dimensional plane . Thus is homeomorphic to an open subset of . f ^ f (y) −1 (n−1) f (y) −1 f (y) −1 Rn−1

Finally, recall that the preimage in a continuous function of
a closed set is closed, so is closed as a subset of . If it were also bounded, then it would be compact. However, the only compact, open subset of is the empty set, so is either unbounded or empty. Since each path component of a subset of is by deﬁnition non-empty, this proves that any component of is unbounded. f (y) −1 Rn Rn−1 f (y) −1 Rn−1 f □

Deep, Skinny Neural Networks are not Universal ...

Deep, Skinny Neural Networks are not Universal Approximators

Breandan Considine

More Decks by Breandan Considine

Other Decks in Research

Featured

Transcript

Deep, Skinny Neural Networks are not Universal Approximators by Jesse

R : k Parameter space R : n Input space

For a neural network with a one-dimensional output, if: the

Deﬁnition 1. We will say that a function has unbounded

Level set: intersection between hyperplane and function

Connected path: We can reach any point from every other

Theorem 1. For any integer and uniformly continuous activation function

The proof of Theorem 1 has three parts: 1. If

Deﬁne the model family of non-singular functions to be the

Lemma 4. If is a function in then every level

Take in a metric space. is a closed, bounded and

Proof. Recall by deﬁnition is a non-singular function in ,

Since is homeomorphic to an open n-dimensional ball, is an

The function is the composition of a linear function to

Since is one-to-one, its restriction to this preimage (shown on

Finally, recall that the preimage in a continuous function of