Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep, Skinny Neural Networks are not Universal Approximators

Deep, Skinny Neural Networks are not Universal Approximators

Many proofs in learning theory typically utilize results from information theory or statistical learning to shed light on generalization or provide guarantees about convergence. While these are indeed powerful tools, it is possible to characterize some properties of neural networks using simpler methods. In Deep, Skinny Neural Networks are not Universal Approximators, Jesse Johnson proves, using standard set theoretic topology, that feedforward nets with a maximum layer width less than or equal to its input size cannot approximate functions with a level set containing a bounded path component. Join us next Tuesday at 10:30am in the Mila Auditorium to discuss this paper!

Presented on March 19th, 2019.

Slides for today's proof are here: https://slides.com/breandan/skinny-nns
ICLR/OpenReview discussion: https://openreview.net/forum?id=ryGgSsAcFQ
The author has a great blog on topology which I particularly enjoyed: https://ldtopology.wordpress.com/

Breandan Considine

July 18, 2023
Tweet

More Decks by Breandan Considine

Other Decks in Research

Transcript

  1. Deep, Skinny
    Neural Networks
    are not Universal
    Approximators
    by Jesse Johnson
    Presented by Breandan Considine

    View full-size slide

  2. R :
    k Parameter space
    R :
    n Input space
    R :
    m Output space
    φ : R → R, Activation function
    n , … , n :
    1 κ−1
    Hidden layers
    N :
    φ,n
    ∗ Family of skinny nets, i.e. the union of all model
    families N ∣
    φ,n ,n ,...,n ,1
    0 1 κ−1
    n ≤
    i
    n∀i ∈ [0, κ)
    N :
    φ,n ,n ,...,n ,1
    0 1 κ−1
    Family of functions defined by a
    FF neural network with n inputs and n outputs
    0 κ
    :
    N
    ^
    n
    Union of all non-singular functions in families Nφ,n

    View full-size slide

  3. For a neural network with a one-dimensional output, if:
    the hidden layers are all the same dimension
    the weight matrices are all non-singular
    the activation function is one-to-one
    Then the composition of all the functions up to last
    linear function will be one-to-one as well. The last linear
    function is a 1D projection, so the preimage of any
    point into the last hidden layer is a hyperplane. The
    preimage of this hyperplane in the one-to-one map
    may not be a hyperplane, but it will be an unbounded
    subset of the domain.

    View full-size slide

  4. Definition 1. We will say that a function
    has unbounded level components if
    for every , every path component of
    is unbounded.
    f :
    R → R
    y ∈ R
    f (y)
    −1

    View full-size slide

  5. Level set: intersection between hyperplane and function

    View full-size slide

  6. Connected path: We can reach
    any point from every other point
    through a continuous path
    (straight line alone is not a
    component by itself)
    Disjoint components
    Connected path: We can
    reach any point from every
    other point

    View full-size slide

  7. Theorem 1. For any integer and
    uniformly continuous activation function
    that can be approximated by one-to-
    one functions, the family of layered feed-
    forward neural networks with input
    dimension in which each hidden layer has
    dimension at most cannot approximate any
    function with a level set containing a
    bounded path component.
    n ≥ 2
    φ :
    R → R
    n
    n

    View full-size slide

  8. The proof of Theorem 1 has three parts:
    1. If a function can be approximated by
    then it can be approximated by the smaller
    model family of non-singular layers (L2)
    2. Therefor it can be approximated by functions
    with unbounded level components (L4)
    3. Therefor it must also have unbounded level
    components (by contradiction). (Lemma 5)
    Nφ,n

    View full-size slide

  9. Define the model family of non-singular functions
    to be the union of all non-singular functions in
    families for all activation functions and a fixed
    .
    Lemma 2. Lemma 2. If is approximated by for
    some continuous activation function that can be
    uniformly approximated by one-to-one functions
    then it is approximated by .
    N
    ^
    n
    Nφ,n
    ∗ φ
    n
    g Nφ,n

    φ
    N
    ^
    n

    View full-size slide

  10. Lemma 4. If is a function in then every level
    set is homeomorphic to an open (possibly
    empty) subset of . This implies that has
    unbounded level components.
    f N
    ^
    n
    f (y)
    −1
    Rn−1 f

    View full-size slide

  11. Take in a metric space.
    is a closed, bounded and compact set in .
    is a closed and bounded set in , which is not compact
    (e.g. ).
    is a closed, but unbounded and not compact set in .
    is an unbounded set which is neither closed nor
    compact in .
    is neither closed nor bounded in , and it's not
    compact.
    No unbounded set or not closed set can be compact in any
    metric space.
    X = (0, ∞)
    [1, 2] X
    (0, 1] X
    (0, 1] ⊆ (1/n, 2)

    n
    [1, ∞) X
    (1, ∞)
    X
    (1, 2) X

    View full-size slide

  12. Proof. Recall by definition is a non-singular function in ,
    where is continuous and one-to-one. Let be the
    function defined by all but the last layer of the network. Let
    bethe function defined by the map from the last
    hidden layer to the final output layer so that .
    Because is nonsingular, the linear functions are all one-to
    one. Because is continuous and one-to-one, so are all the
    non-linear functions. Thus the composition is also one-to-one,
    and therefore a homeomorphism from onto its image .
    f N
    ^
    n
    φ :
    f
    ^ R →
    n
    Rn
    :
    f
    R →
    n
    Rn
    f = ∘
    f f
    ^
    f
    φ
    f
    ^
    Rn I
    f
    ^

    View full-size slide

  13. Since is homeomorphic to an open n-dimensional ball, is
    an open subset of , as indicated in the top row of Figure 1.
    Rn I
    f
    ^
    Rn

    View full-size slide

  14. The function is the composition of a linear function to with ,
    which is one-to-one by assumption. So the preimage for
    any is an (n−1)-dimensional plane in . The preimage
    is the preimage in this (n−1)-dimensional plane, or
    rather the preimage of the inter-section , as indicated
    in the bottom right/center of Figure 2. Since is open as asubset
    of , the intersection is open as a subset of ̄ .
    f R φ
    (y)
    f−1
    y ∈ R Rn
    f (y)
    −1 f
    ^
    I ∩
    f
    ^ (y)
    f−1
    I
    f
    ^
    Rn f (y)
    −1

    View full-size slide

  15. Since is one-to-one, its restriction to this preimage (shown on
    the bottom left of the Figure) is a homeomorphism from
    to this open subset of the -dimensional plane .
    Thus is homeomorphic to an open subset of .
    f
    ^
    f (y)
    −1
    (n−1) f (y)
    −1
    f (y)
    −1 Rn−1

    View full-size slide

  16. Finally, recall that the preimage in a continuous function of a
    closed set is closed, so is closed as a subset of . If it
    were also bounded, then it would be compact. However, the
    only compact, open subset of is the empty set, so
    is either unbounded or empty. Since each path
    component of a subset of is by definition non-empty,
    this proves that any component of is unbounded.
    f (y)
    −1 Rn
    Rn−1
    f (y)
    −1
    Rn−1
    f □

    View full-size slide