Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Attention Neural Net Model Fundamentals

Attention Neural Net Model Fundamentals

Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.

This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.

Melanie Warrick

November 12, 2015
Tweet

More Decks by Melanie Warrick

Other Decks in Technology

Transcript

  1. @nyghtowl Overview - Attention - Soft vs Hard - Hard

    Attention for Computer Vision - Learning Rule - Example Performance
  2. @nyghtowl Soft - read all input & weighted average of

    all expected output - standard loss derivative Hard - samples input & weighted average of estimated output - policy gradient & variance reduction Model Types
  3. @nyghtowl Model Variations Soft - NTM Neural Turing Machine -

    Memory Network - DRAW Deep Recurrent Attention Writer (“Differentiable”) - Stacked-Augmented Recurrent Nets Hard - RAM Recurrent Attention Model - DRAM Deep Recurrent Attention Model - RL-NTM Reinforce Neural Turing Machine
  4. @nyghtowl - Memory - Reading / Writing - Language generation

    - Picture generation - Classifying image objects - Image search - Describing images / videos Applications
  5. @nyghtowl General Goal - min error | max reward -

    reward can be sparse & delayed
  6. @nyghtowl • Neural Turing Machines http://arxiv.org/pdf/1410.5401v2.pdf (Graves et al., 2014)

    • Reinforcement Learning NTM http://arxiv.org/pdf/1505.00521v1.pdf (Zaremba et al., 2015) • End-To-End Memory Network http://arxiv.org/pdf/1503.08895v4.pdf (Sukhbaatar et al., 2015) • Recurrent Models of Visual Attention http://arxiv.org/pdf/1406.6247v1.pdf (Mnih et al., 2014) • Multiple Object Recognition with Visual Attention http://arxiv.org/pdf/1412.7755v2.pdf (Ba et al., 2014) • Show, Attend and Tell http://arxiv.org/pdf/1502.03044v2.pdf (Xu et al., 2015) • DRAW http://arxiv.org/pdf/1502.04623v2.pdf (Gregor et al., 2015) • Neural Machine Translation by Jointly Learning to Align and Translate http://arxiv.org/pdf/1409. 0473v6.pdf (Bahdanau et al., 2014) • Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets http://arxiv.org/pdf/1503. 01007v4.pdf (Joulin et al., 2015) • Deep Learning Theory & Applicaitons: https://www.youtube.com/watch?v=aUTHdgh1OjI • The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github. io/2015/05/21/rnn-effectiveness/ • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning http://www- anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf (Williams, 1992) References
  7. @nyghtowl • Spatial Transformer Networks http://arxiv.org/pdf/1506.02025v1.pdf (Jaderberg et al., 2015)

    • Recurrent Spatial Transformer Networks http://arxiv.org/pdf/1509.05329v1.pdf (Sønderby et al., 2015) • Spatial Transformer Networks Video https://youtu.be/yGFVO2B8gok • Learning Stochastic Feedforward Neural Networks http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang & Salakhutdinov, 2013) • Learning Stochastic Recurrent Networks http://arxiv.org/pdf/1411.7610v3.pdf (Bayer & Osendorfer 2015) • Learning Generative Models with Visual Attention http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang et al., 2014) References
  8. @nyghtowl Special Thanks • Mark Ettinger • Rewon Child •

    Diogo Almeida • Stanislav Nikolov • Adam Gibson • Tarin Ziyaee • Charlie Tang • Dave Kammeyer
  9. @nyghtowl References: Images • http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ • http://deeplearning.net/tutorial/lenet.html • https://stats.stackexchange.com/questions/114385/what-is-the-difference-between- convolutional-neural-networks-restricted-boltzma

    • http://myndset.com/2011/12/15/making-the-switch-where-to-find-the-money-for-your-digital- marketing-strategy/ • http://blog.archerhotel.com/spyglass-rooftop-bar-nyc-making-manhattan-look-twice/ • http://www.serps-invaders.com/blog/how-to-find-broken-links-on-your-site/ • http://arxiv.org/pdf/1502.04623v2.pdf • https://en.wikipedia.org/wiki/Attention • http://web.media.mit.edu/~lieber/Teaching/Context/
  10. @nyghtowl Artificial Neural Nets Input Output Hidden Run until error

    stops improving = converge Loss Function Output k j X M kj W y