Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Attention Neural Net Model Fundamentals

Attention Neural Net Model Fundamentals

Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.

This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.

Melanie Warrick

November 12, 2015
Tweet

More Decks by Melanie Warrick

Other Decks in Technology

Transcript

  1. Attention Models
    Melanie Warrick
    @nyghtowl

    View full-size slide

  2. @nyghtowl
    Overview
    - Attention
    - Soft vs Hard
    - Hard Attention for Computer Vision
    - Learning Rule
    - Example Performance

    View full-size slide

  3. @nyghtowl
    Attention ~ Selective

    View full-size slide

  4. @nyghtowl
    Attention Mechanism
    input focus
    sequential | context
    weights

    View full-size slide

  5. @nyghtowl
    Zoom-lens
    adds changing filter size
    Attention Techniques
    Spotlight
    varying resolution

    View full-size slide

  6. @nyghtowl
    Where to look?
    Attention Decision

    View full-size slide

  7. @nyghtowl
    Soft
    - read all input & weighted average of all expected output
    - standard loss derivative
    Hard
    - samples input & weighted average of estimated output
    - policy gradient & variance reduction
    Model Types

    View full-size slide

  8. @nyghtowl
    Soft vs Hard Focus Examples
    Soft
    Hard

    View full-size slide

  9. @nyghtowl
    Soft Attention
    Value
    Challenge
    scale limitations
    CONTEXT
    AWARE

    View full-size slide

  10. @nyghtowl
    Value
    data size # computations
    Challenge
    context & training time
    Hard Attention

    View full-size slide

  11. @nyghtowl
    Model Variations
    Soft
    - NTM Neural Turing Machine
    - Memory Network
    - DRAW Deep Recurrent Attention Writer (“Differentiable”)
    - Stacked-Augmented Recurrent Nets
    Hard
    - RAM Recurrent Attention Model
    - DRAM Deep Recurrent Attention Model
    - RL-NTM Reinforce Neural Turing Machine

    View full-size slide

  12. @nyghtowl
    - Memory - Reading / Writing
    - Language generation
    - Picture generation
    - Classifying image objects
    - Image search
    - Describing images / videos
    Applications

    View full-size slide

  13. @nyghtowl
    Hard Model & Computer Vision

    View full-size slide

  14. @nyghtowl
    Convolutional Neural Nets

    View full-size slide

  15. @nyghtowl
    Linear Complexity Growth

    View full-size slide

  16. @nyghtowl
    Constrained Computations

    View full-size slide

  17. @nyghtowl
    Recurrent Neural Nets

    View full-size slide

  18. @nyghtowl
    General Goal
    - min error | max reward
    - reward can be sparse & delayed

    View full-size slide

  19. @nyghtowl
    Deep Recurrent Attention Model

    View full-size slide

  20. @nyghtowl
    REINFORCE Learning Rule
    weight change = reward change given glimpse

    View full-size slide

  21. @nyghtowl
    Performance Comparison
    SVHN - Street View House Number data-set

    View full-size slide

  22. @nyghtowl
    Performance Comparison
    DRAM vs CNN - Computation Complexity

    View full-size slide

  23. @nyghtowl
    Last Points
    - adaptive selection & context
    - constrained computations
    - accuracy

    View full-size slide

  24. @nyghtowl
    ● Neural Turing Machines http://arxiv.org/pdf/1410.5401v2.pdf (Graves et al., 2014)
    ● Reinforcement Learning NTM http://arxiv.org/pdf/1505.00521v1.pdf (Zaremba et al., 2015)
    ● End-To-End Memory Network http://arxiv.org/pdf/1503.08895v4.pdf (Sukhbaatar et al., 2015)
    ● Recurrent Models of Visual Attention http://arxiv.org/pdf/1406.6247v1.pdf (Mnih et al., 2014)
    ● Multiple Object Recognition with Visual Attention http://arxiv.org/pdf/1412.7755v2.pdf (Ba et al., 2014)
    ● Show, Attend and Tell http://arxiv.org/pdf/1502.03044v2.pdf (Xu et al., 2015)
    ● DRAW http://arxiv.org/pdf/1502.04623v2.pdf (Gregor et al., 2015)
    ● Neural Machine Translation by Jointly Learning to Align and Translate http://arxiv.org/pdf/1409.
    0473v6.pdf (Bahdanau et al., 2014)
    ● Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets http://arxiv.org/pdf/1503.
    01007v4.pdf (Joulin et al., 2015)
    ● Deep Learning Theory & Applicaitons: https://www.youtube.com/watch?v=aUTHdgh1OjI
    ● The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github.
    io/2015/05/21/rnn-effectiveness/
    ● Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning http://www-
    anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf (Williams, 1992)
    References

    View full-size slide

  25. @nyghtowl
    ● Spatial Transformer Networks http://arxiv.org/pdf/1506.02025v1.pdf (Jaderberg et al., 2015)
    ● Recurrent Spatial Transformer Networks http://arxiv.org/pdf/1509.05329v1.pdf (Sønderby et al., 2015)
    ● Spatial Transformer Networks Video https://youtu.be/yGFVO2B8gok
    ● Learning Stochastic Feedforward Neural Networks http://www.cs.toronto.edu/~tang/papers/sfnn.pdf
    (Tang & Salakhutdinov, 2013)
    ● Learning Stochastic Recurrent Networks http://arxiv.org/pdf/1411.7610v3.pdf (Bayer & Osendorfer
    2015)
    ● Learning Generative Models with Visual Attention http://www.cs.toronto.edu/~tang/papers/sfnn.pdf
    (Tang et al., 2014)
    References

    View full-size slide

  26. @nyghtowl
    Special Thanks
    ● Mark Ettinger
    ● Rewon Child
    ● Diogo Almeida
    ● Stanislav Nikolov
    ● Adam Gibson
    ● Tarin Ziyaee
    ● Charlie Tang
    ● Dave Kammeyer

    View full-size slide

  27. @nyghtowl
    References: Images
    ● http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
    ● http://deeplearning.net/tutorial/lenet.html
    ● https://stats.stackexchange.com/questions/114385/what-is-the-difference-between-
    convolutional-neural-networks-restricted-boltzma
    ● http://myndset.com/2011/12/15/making-the-switch-where-to-find-the-money-for-your-digital-
    marketing-strategy/
    ● http://blog.archerhotel.com/spyglass-rooftop-bar-nyc-making-manhattan-look-twice/
    ● http://www.serps-invaders.com/blog/how-to-find-broken-links-on-your-site/
    ● http://arxiv.org/pdf/1502.04623v2.pdf
    ● https://en.wikipedia.org/wiki/Attention
    ● http://web.media.mit.edu/~lieber/Teaching/Context/

    View full-size slide

  28. @nyghtowl
    Attention Models
    Melanie Warrick
    skymind.io (company)
    gitter.im/deeplearning4j/deeplearning4j

    View full-size slide

  29. @nyghtowl
    Artificial Neural Nets
    Input Output
    Hidden
    Run until error stops improving = converge
    Loss Function
    Output
    k j
    X
    M
    kj
    W
    y

    View full-size slide