Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep, Skinny Neural Networks are not Universal Approximators

Deep, Skinny Neural Networks are not Universal Approximators

Many proofs in learning theory typically utilize results from information theory or statistical learning to shed light on generalization or provide guarantees about convergence. While these are indeed powerful tools, it is possible to characterize some properties of neural networks using simpler methods. In Deep, Skinny Neural Networks are not Universal Approximators, Jesse Johnson proves, using standard set theoretic topology, that feedforward nets with a maximum layer width less than or equal to its input size cannot approximate functions with a level set containing a bounded path component. Join us next Tuesday at 10:30am in the Mila Auditorium to discuss this paper!

Presented on March 19th, 2019.

Slides for today's proof are here: https://slides.com/breandan/skinny-nns
ICLR/OpenReview discussion: https://openreview.net/forum?id=ryGgSsAcFQ
The author has a great blog on topology which I particularly enjoyed: https://ldtopology.wordpress.com/

Breandan Considine

July 18, 2023

More Decks by Breandan Considine

Other Decks in Research


  1. R : k Parameter space R : n Input space

    R : m Output space φ : R → R, Activation function n , … , n : 1 κ−1 Hidden layers N : φ,n ∗ Family of skinny nets, i.e. the union of all model families N ∣ φ,n ,n ,...,n ,1 0 1 κ−1 n ≤ i n∀i ∈ [0, κ) N : φ,n ,n ,...,n ,1 0 1 κ−1 Family of functions defined by a FF neural network with n inputs and n outputs 0 κ : N ^ n Union of all non-singular functions in families Nφ,n
  2. For a neural network with a one-dimensional output, if: the

    hidden layers are all the same dimension the weight matrices are all non-singular the activation function is one-to-one Then the composition of all the functions up to last linear function will be one-to-one as well. The last linear function is a 1D projection, so the preimage of any point into the last hidden layer is a hyperplane. The preimage of this hyperplane in the one-to-one map may not be a hyperplane, but it will be an unbounded subset of the domain.
  3. Definition 1. We will say that a function has unbounded

    level components if for every , every path component of is unbounded. f : R → R y ∈ R f (y) −1
  4. Connected path: We can reach any point from every other

    point through a continuous path (straight line alone is not a component by itself) Disjoint components Connected path: We can reach any point from every other point
  5. Theorem 1. For any integer and uniformly continuous activation function

    that can be approximated by one-to- one functions, the family of layered feed- forward neural networks with input dimension in which each hidden layer has dimension at most cannot approximate any function with a level set containing a bounded path component. n ≥ 2 φ : R → R n n
  6. The proof of Theorem 1 has three parts: 1. If

    a function can be approximated by then it can be approximated by the smaller model family of non-singular layers (L2) 2. Therefor it can be approximated by functions with unbounded level components (L4) 3. Therefor it must also have unbounded level components (by contradiction). (Lemma 5) Nφ,n ∗
  7. Define the model family of non-singular functions to be the

    union of all non-singular functions in families for all activation functions and a fixed . Lemma 2. Lemma 2. If is approximated by for some continuous activation function that can be uniformly approximated by one-to-one functions then it is approximated by . N ^ n Nφ,n ∗ φ n g Nφ,n ∗ φ N ^ n
  8. Lemma 4. If is a function in then every level

    set is homeomorphic to an open (possibly empty) subset of . This implies that has unbounded level components. f N ^ n f (y) −1 Rn−1 f
  9. Take in a metric space. is a closed, bounded and

    compact set in . is a closed and bounded set in , which is not compact (e.g. ). is a closed, but unbounded and not compact set in . is an unbounded set which is neither closed nor compact in . is neither closed nor bounded in , and it's not compact. No unbounded set or not closed set can be compact in any metric space. X = (0, ∞) [1, 2] X (0, 1] X (0, 1] ⊆ (1/n, 2) ⋃ n [1, ∞) X (1, ∞) X (1, 2) X
  10. Proof. Recall by definition is a non-singular function in ,

    where is continuous and one-to-one. Let be the function defined by all but the last layer of the network. Let bethe function defined by the map from the last hidden layer to the final output layer so that . Because is nonsingular, the linear functions are all one-to one. Because is continuous and one-to-one, so are all the non-linear functions. Thus the composition is also one-to-one, and therefore a homeomorphism from onto its image . f N ^ n φ : f ^ R → n Rn : f R → n Rn f = ∘ f f ^ f φ f ^ Rn I f ^
  11. Since is homeomorphic to an open n-dimensional ball, is an

    open subset of , as indicated in the top row of Figure 1. Rn I f ^ Rn
  12. The function is the composition of a linear function to

    with , which is one-to-one by assumption. So the preimage for any is an (n−1)-dimensional plane in . The preimage is the preimage in this (n−1)-dimensional plane, or rather the preimage of the inter-section , as indicated in the bottom right/center of Figure 2. Since is open as asubset of , the intersection is open as a subset of ̄ . f R φ (y) f−1 y ∈ R Rn f (y) −1 f ^ I ∩ f ^ (y) f−1 I f ^ Rn f (y) −1
  13. Since is one-to-one, its restriction to this preimage (shown on

    the bottom left of the Figure) is a homeomorphism from to this open subset of the -dimensional plane . Thus is homeomorphic to an open subset of . f ^ f (y) −1 (n−1) f (y) −1 f (y) −1 Rn−1
  14. Finally, recall that the preimage in a continuous function of

    a closed set is closed, so is closed as a subset of . If it were also bounded, then it would be compact. However, the only compact, open subset of is the empty set, so is either unbounded or empty. Since each path component of a subset of is by definition non-empty, this proves that any component of is unbounded. f (y) −1 Rn Rn−1 f (y) −1 Rn−1 f □