Slide 1

Slide 1 text

Attention Models Melanie Warrick @nyghtowl

Slide 2

Slide 2 text

@nyghtowl Overview - Attention - Soft vs Hard - Hard Attention for Computer Vision - Learning Rule - Example Performance

Slide 3

Slide 3 text

@nyghtowl Attention ~ Selective

Slide 4

Slide 4 text

@nyghtowl Attention Mechanism input focus sequential | context weights

Slide 5

Slide 5 text

@nyghtowl Zoom-lens adds changing filter size Attention Techniques Spotlight varying resolution

Slide 6

Slide 6 text

@nyghtowl Where to look? Attention Decision

Slide 7

Slide 7 text

@nyghtowl Soft - read all input & weighted average of all expected output - standard loss derivative Hard - samples input & weighted average of estimated output - policy gradient & variance reduction Model Types

Slide 8

Slide 8 text

@nyghtowl Soft vs Hard Focus Examples Soft Hard

Slide 9

Slide 9 text

@nyghtowl Soft Attention Value Challenge scale limitations CONTEXT AWARE

Slide 10

Slide 10 text

@nyghtowl Value data size # computations Challenge context & training time Hard Attention

Slide 11

Slide 11 text

@nyghtowl Model Variations Soft - NTM Neural Turing Machine - Memory Network - DRAW Deep Recurrent Attention Writer (“Differentiable”) - Stacked-Augmented Recurrent Nets Hard - RAM Recurrent Attention Model - DRAM Deep Recurrent Attention Model - RL-NTM Reinforce Neural Turing Machine

Slide 12

Slide 12 text

@nyghtowl - Memory - Reading / Writing - Language generation - Picture generation - Classifying image objects - Image search - Describing images / videos Applications

Slide 13

Slide 13 text

@nyghtowl Hard Model & Computer Vision

Slide 14

Slide 14 text

@nyghtowl Convolutional Neural Nets

Slide 15

Slide 15 text

@nyghtowl Linear Complexity Growth

Slide 16

Slide 16 text

@nyghtowl Constrained Computations

Slide 17

Slide 17 text

@nyghtowl Recurrent Neural Nets

Slide 18

Slide 18 text

@nyghtowl General Goal - min error | max reward - reward can be sparse & delayed

Slide 19

Slide 19 text

@nyghtowl Deep Recurrent Attention Model

Slide 20

Slide 20 text

@nyghtowl REINFORCE Learning Rule weight change = reward change given glimpse

Slide 21

Slide 21 text

@nyghtowl Performance Comparison SVHN - Street View House Number data-set

Slide 22

Slide 22 text

@nyghtowl Performance Comparison DRAM vs CNN - Computation Complexity

Slide 23

Slide 23 text

@nyghtowl Last Points - adaptive selection & context - constrained computations - accuracy

Slide 24

Slide 24 text

@nyghtowl ● Neural Turing Machines http://arxiv.org/pdf/1410.5401v2.pdf (Graves et al., 2014) ● Reinforcement Learning NTM http://arxiv.org/pdf/1505.00521v1.pdf (Zaremba et al., 2015) ● End-To-End Memory Network http://arxiv.org/pdf/1503.08895v4.pdf (Sukhbaatar et al., 2015) ● Recurrent Models of Visual Attention http://arxiv.org/pdf/1406.6247v1.pdf (Mnih et al., 2014) ● Multiple Object Recognition with Visual Attention http://arxiv.org/pdf/1412.7755v2.pdf (Ba et al., 2014) ● Show, Attend and Tell http://arxiv.org/pdf/1502.03044v2.pdf (Xu et al., 2015) ● DRAW http://arxiv.org/pdf/1502.04623v2.pdf (Gregor et al., 2015) ● Neural Machine Translation by Jointly Learning to Align and Translate http://arxiv.org/pdf/1409. 0473v6.pdf (Bahdanau et al., 2014) ● Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets http://arxiv.org/pdf/1503. 01007v4.pdf (Joulin et al., 2015) ● Deep Learning Theory & Applicaitons: https://www.youtube.com/watch?v=aUTHdgh1OjI ● The Unreasonable Effectiveness of Recurrent Neural Networks https://karpathy.github. io/2015/05/21/rnn-effectiveness/ ● Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning http://www- anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf (Williams, 1992) References

Slide 25

Slide 25 text

@nyghtowl ● Spatial Transformer Networks http://arxiv.org/pdf/1506.02025v1.pdf (Jaderberg et al., 2015) ● Recurrent Spatial Transformer Networks http://arxiv.org/pdf/1509.05329v1.pdf (Sønderby et al., 2015) ● Spatial Transformer Networks Video https://youtu.be/yGFVO2B8gok ● Learning Stochastic Feedforward Neural Networks http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang & Salakhutdinov, 2013) ● Learning Stochastic Recurrent Networks http://arxiv.org/pdf/1411.7610v3.pdf (Bayer & Osendorfer 2015) ● Learning Generative Models with Visual Attention http://www.cs.toronto.edu/~tang/papers/sfnn.pdf (Tang et al., 2014) References

Slide 26

Slide 26 text

@nyghtowl Special Thanks ● Mark Ettinger ● Rewon Child ● Diogo Almeida ● Stanislav Nikolov ● Adam Gibson ● Tarin Ziyaee ● Charlie Tang ● Dave Kammeyer

Slide 27

Slide 27 text

@nyghtowl References: Images ● http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ ● http://deeplearning.net/tutorial/lenet.html ● https://stats.stackexchange.com/questions/114385/what-is-the-difference-between- convolutional-neural-networks-restricted-boltzma ● http://myndset.com/2011/12/15/making-the-switch-where-to-find-the-money-for-your-digital- marketing-strategy/ ● http://blog.archerhotel.com/spyglass-rooftop-bar-nyc-making-manhattan-look-twice/ ● http://www.serps-invaders.com/blog/how-to-find-broken-links-on-your-site/ ● http://arxiv.org/pdf/1502.04623v2.pdf ● https://en.wikipedia.org/wiki/Attention ● http://web.media.mit.edu/~lieber/Teaching/Context/

Slide 28

Slide 28 text

@nyghtowl Attention Models Melanie Warrick skymind.io (company) gitter.im/deeplearning4j/deeplearning4j

Slide 29

Slide 29 text

@nyghtowl Artificial Neural Nets Input Output Hidden Run until error stops improving = converge Loss Function Output k j X M kj W y