Deep, Skinny Neural Networks are not Universal Approximators

Slide 1

Slide 1 text

Deep, Skinny Neural Networks are not Universal Approximators by Jesse Johnson Presented by Breandan Considine

Slide 2

Slide 2 text

R : k Parameter space R : n Input space R : m Output space φ : R → R, Activation function n , … , n : 1 κ−1 Hidden layers N : φ,n ∗ Family of skinny nets, i.e. the union of all model families N ∣ φ,n ,n ,...,n ,1 0 1 κ−1 n ≤ i n∀i ∈ [0, κ) N : φ,n ,n ,...,n ,1 0 1 κ−1 Family of functions defined by a FF neural network with n inputs and n outputs 0 κ : N ^ n Union of all non-singular functions in families Nφ,n

Slide 3

Slide 3 text

For a neural network with a one-dimensional output, if: the hidden layers are all the same dimension the weight matrices are all non-singular the activation function is one-to-one Then the composition of all the functions up to last linear function will be one-to-one as well. The last linear function is a 1D projection, so the preimage of any point into the last hidden layer is a hyperplane. The preimage of this hyperplane in the one-to-one map may not be a hyperplane, but it will be an unbounded subset of the domain.

Slide 4

Slide 4 text

Deﬁnition 1. We will say that a function has unbounded level components if for every , every path component of is unbounded. f : R → R y ∈ R f (y) −1

Slide 5

Slide 5 text

Level set: intersection between hyperplane and function

Slide 6

Slide 6 text

Connected path: We can reach any point from every other point through a continuous path (straight line alone is not a component by itself) Disjoint components Connected path: We can reach any point from every other point

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Theorem 1. For any integer and uniformly continuous activation function that can be approximated by one-to- one functions, the family of layered feed- forward neural networks with input dimension in which each hidden layer has dimension at most cannot approximate any function with a level set containing a bounded path component. n ≥ 2 φ : R → R n n

Slide 10

Slide 10 text

The proof of Theorem 1 has three parts: 1. If a function can be approximated by then it can be approximated by the smaller model family of non-singular layers (L2) 2. Therefor it can be approximated by functions with unbounded level components (L4) 3. Therefor it must also have unbounded level components (by contradiction). (Lemma 5) Nφ,n ∗

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Deﬁne the model family of non-singular functions to be the union of all non-singular functions in families for all activation functions and a ﬁxed . Lemma 2. Lemma 2. If is approximated by for some continuous activation function that can be uniformly approximated by one-to-one functions then it is approximated by . N ^ n Nφ,n ∗ φ n g Nφ,n ∗ φ N ^ n

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Lemma 4. If is a function in then every level set is homeomorphic to an open (possibly empty) subset of . This implies that has unbounded level components. f N ^ n f (y) −1 Rn−1 f

Slide 16

Slide 16 text

Take in a metric space. is a closed, bounded and compact set in . is a closed and bounded set in , which is not compact (e.g. ). is a closed, but unbounded and not compact set in . is an unbounded set which is neither closed nor compact in . is neither closed nor bounded in , and it's not compact. No unbounded set or not closed set can be compact in any metric space. X = (0, ∞) [1, 2] X (0, 1] X (0, 1] ⊆ (1/n, 2) ⋃ n [1, ∞) X (1, ∞) X (1, 2) X

Slide 17

Slide 17 text

Proof. Recall by definition is a non-singular function in , where is continuous and one-to-one. Let be the function defined by all but the last layer of the network. Let bethe function defined by the map from the last hidden layer to the final output layer so that . Because is nonsingular, the linear functions are all one-to one. Because is continuous and one-to-one, so are all the non-linear functions. Thus the composition is also one-to-one, and therefore a homeomorphism from onto its image . f N ^ n φ : f ^ R → n Rn : f R → n Rn f = ∘ f f ^ f φ f ^ Rn I f ^

Slide 18

Slide 18 text

Since is homeomorphic to an open n-dimensional ball, is an open subset of , as indicated in the top row of Figure 1. Rn I f ^ Rn

Slide 19

Slide 19 text

The function is the composition of a linear function to with , which is one-to-one by assumption. So the preimage for any is an (n−1)-dimensional plane in . The preimage is the preimage in this (n−1)-dimensional plane, or rather the preimage of the inter-section , as indicated in the bottom right/center of Figure 2. Since is open as asubset of , the intersection is open as a subset of ̄ . f R φ (y) f−1 y ∈ R Rn f (y) −1 f ^ I ∩ f ^ (y) f−1 I f ^ Rn f (y) −1

Slide 20

Slide 20 text

Since is one-to-one, its restriction to this preimage (shown on the bottom left of the Figure) is a homeomorphism from to this open subset of the -dimensional plane . Thus is homeomorphic to an open subset of . f ^ f (y) −1 (n−1) f (y) −1 f (y) −1 Rn−1

Slide 21

Slide 21 text

Finally, recall that the preimage in a continuous function of a closed set is closed, so is closed as a subset of . If it were also bounded, then it would be compact. However, the only compact, open subset of is the empty set, so is either unbounded or empty. Since each path component of a subset of is by deﬁnition non-empty, this proves that any component of is unbounded. f (y) −1 Rn Rn−1 f (y) −1 Rn−1 f □