Attention Neural Net Model Fundamentals

Attention Neural Net Model Fundamentals

Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.

This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.

2168aa4564112d3ba88869ca3cc994b3?s=128

Melanie Warrick

November 12, 2015
Tweet

Transcript

  1. Attention Models Melanie Warrick @nyghtowl

  2. @nyghtowl Overview - Attention - Soft vs Hard - Hard

    Attention for Computer Vision - Learning Rule - Example Performance
  3. @nyghtowl Attention ~ Selective

  4. @nyghtowl Attention Mechanism input focus sequential | context weights

  5. @nyghtowl Zoom-lens adds changing filter size Attention Techniques Spotlight varying

    resolution
  6. @nyghtowl Where to look? Attention Decision

  7. @nyghtowl Soft - read all input & weighted average of

    all expected output - standard loss derivative Hard - samples input & weighted average of estimated output - policy gradient & variance reduction Model Types
  8. @nyghtowl Soft vs Hard Focus Examples Soft Hard

  9. @nyghtowl Soft Attention Value Challenge scale limitations CONTEXT AWARE

  10. @nyghtowl Value data size # computations Challenge context & training

    time Hard Attention
  11. @nyghtowl Model Variations Soft - NTM Neural Turing Machine -

    Memory Network - DRAW Deep Recurrent Attention Writer (“Differentiable”) - Stacked-Augmented Recurrent Nets Hard - RAM Recurrent Attention Model - DRAM Deep Recurrent Attention Model - RL-NTM Reinforce Neural Turing Machine
  12. @nyghtowl - Memory - Reading / Writing - Language generation

    - Picture generation - Classifying image objects - Image search - Describing images / videos Applications
  13. @nyghtowl Hard Model & Computer Vision

  14. @nyghtowl Convolutional Neural Nets

  15. @nyghtowl Linear Complexity Growth

  16. @nyghtowl Constrained Computations

  17. @nyghtowl Recurrent Neural Nets

  18. @nyghtowl General Goal - min error | max reward -

    reward can be sparse & delayed
  19. @nyghtowl Deep Recurrent Attention Model

  20. @nyghtowl REINFORCE Learning Rule weight change = reward change given

    glimpse
  21. @nyghtowl Performance Comparison SVHN - Street View House Number data-set

  22. @nyghtowl Performance Comparison DRAM vs CNN - Computation Complexity

  23. @nyghtowl Last Points - adaptive selection & context - constrained

    computations - accuracy
  24. @nyghtowl • Neural Turing Machines http://arxiv.org/pdf/1410.5401v2.pdf (Graves et al., 2014)

    • Reinforcement Learning NTM http://arxiv.org/pdf/1505.00521v1.pdf (Zaremba et al., 2015) • End-To-End Memory Network http://arxiv.org/pdf/1503.08895v4.pdf (Sukhbaatar et al., 2015) • Recurrent Models of Visual Attention http://arxiv.org/pdf/1406.6247v1.pdf (Mnih et al., 2014) • Multiple Object Recognition with Visual Attention http://arxiv.org/pdf/1412.7755v2.pdf (Ba et al., 2014) • Show, Attend and Tell http://arxiv.org/pdf/1502.03044v2.pdf (Xu et al., 2015) • DRAW http://arxiv.org/pdf/1502.04623v2.pdf (Gregor et al., 2015) • Neural Machine Translation by Jointly Learning to Align and Translate http://arxiv.org/pdf/1409. 0473v6.pdf (Bahdanau et al., 2014) • Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets http://arxiv.org/pdf/1503. 01007v4.pdf (Joulin et al., 2015) • Deep Learning Theory & Applicaitons: https://www.youtube.com/watch?v=aUTHdgh1OjI • The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github. io/2015/05/21/rnn-effectiveness/ • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning http://www- anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf (Williams, 1992) References
  25. @nyghtowl • Spatial Transformer Networks http://arxiv.org/pdf/1506.02025v1.pdf (Jaderberg et al., 2015)

    • Recurrent Spatial Transformer Networks http://arxiv.org/pdf/1509.05329v1.pdf (Sønderby et al., 2015) • Spatial Transformer Networks Video https://youtu.be/yGFVO2B8gok • Learning Stochastic Feedforward Neural Networks http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang & Salakhutdinov, 2013) • Learning Stochastic Recurrent Networks http://arxiv.org/pdf/1411.7610v3.pdf (Bayer & Osendorfer 2015) • Learning Generative Models with Visual Attention http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang et al., 2014) References
  26. @nyghtowl Special Thanks • Mark Ettinger • Rewon Child •

    Diogo Almeida • Stanislav Nikolov • Adam Gibson • Tarin Ziyaee • Charlie Tang • Dave Kammeyer
  27. @nyghtowl References: Images • http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ • http://deeplearning.net/tutorial/lenet.html • https://stats.stackexchange.com/questions/114385/what-is-the-difference-between- convolutional-neural-networks-restricted-boltzma

    • http://myndset.com/2011/12/15/making-the-switch-where-to-find-the-money-for-your-digital- marketing-strategy/ • http://blog.archerhotel.com/spyglass-rooftop-bar-nyc-making-manhattan-look-twice/ • http://www.serps-invaders.com/blog/how-to-find-broken-links-on-your-site/ • http://arxiv.org/pdf/1502.04623v2.pdf • https://en.wikipedia.org/wiki/Attention • http://web.media.mit.edu/~lieber/Teaching/Context/
  28. @nyghtowl Attention Models Melanie Warrick skymind.io (company) gitter.im/deeplearning4j/deeplearning4j

  29. @nyghtowl Artificial Neural Nets Input Output Hidden Run until error

    stops improving = converge Loss Function Output k j X M kj W y