Long Term Short Term Memory

Slide 1

Slide 1 text

Long-Term Short-Term Memory ABECON 11.09.2015 LESZEK Rybicki

Slide 2

Slide 2 text

Outline 1. Memory Game 2. Recurrent Neural Networks 3. Memory Game 4. Long-Term Short-Term Memory Networks 5. Memory Game 6. ...one more thing

Slide 3

Slide 3 text

DAY 2: -3 tasks done, +2 new DAY 3: +3 new tasks, -1 done DAY 4: did -3 tasks, +4 new tasks DAY 5: +1 new task, did -5! Memory Game DAY 1: +5 new tasks What is the name of the intern? Imagine that you are an intern

Slide 4

Slide 4 text

Feedforward Network: XOR 0 xor 0 = 0 1 xor 0 = 1 0 xor 1 = 1 1 xor 1 = 0 0.6 0.4 0.6 0.4 0.5 -0.5 -1 -1

Slide 5

Slide 5 text

Feedforward Network in a box x output= (W 2 · hidden) W 1 W 2 hidden= (W 1 · x + [-1,-1]) 0.6 0.4 0.6 0.4 0.5 -0.5 -1 -1

Slide 6

Slide 6 text

Recurrent Network: XOR [0, 0] = [?, 0] [1, 0] = [?, 1] [0, 1] = [?, 1] [1, 1] = [?, 0] t-1

Slide 7

Slide 7 text

Recurrent Network: unfolded x t+1 x t+2 x t+3 t-1 x t h t h t+1 h t+2 h t+3

Slide 8

Slide 8 text

Trouble with recurrent networks - short attention span - not good with distractions - unstable when training - training takes a long time - short attention span

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Jürgen Schmidhuber “YOU AGAIN” To avoid long time lag problems of gradient-based approaches we may simply randomly initialize all network weights until the resulting net happens to classify all training sequences correctly. In fact, recently we discovered that simple weight guessing solves many of the problems faster than the algorithms proposed therein. This does not mean that weight guessing is a good algorithm. It just means that the problems are very simple. -- Jürgen Schmidhuber LONG SHORT-TERM MEMORY Neural Computation 9(8):1735{1780, 1997}

Slide 11

Slide 11 text

LSTM Unit ✕ tanh ✕ ✕ ta + ✕ ✕ ✕ tanh tanh + x t h t h t-1 x t+1 C t h t-1 C t-1 h t

Slide 12

Slide 12 text

Symbols ✕ neural network layer pointwise operation + concatenate two vectors clone vector ✕ [x,y,z] [a,b,c] [ax,by,cz] + [x,y,z] [a,b,c] [a+x, b+y, c+z] tanh

Slide 13

Slide 13 text

Cell state ✕ ✕ tanh tanh x t h t C t h t-1 C t-1 h t + ✕

Slide 14

Slide 14 text

Forget gate ✕ ✕ tanh tanh + x t h t C t h t-1 C t-1 ✕ h t ✕ x,y,z a,b,c ax,by,cz

Slide 15

Slide 15 text

Update gate ✕ tanh x t h t C t h t-1 C t-1 ✕ ✕ tanh + h t + x,y,z a,b,c a+x, b+y, c+z

Slide 16

Slide 16 text

Select gate x t h t C t h t-1 C t-1 ✕ ✕ tanh + ✕ tanh h t tanh

Slide 17

Slide 17 text

LSTM unit ✕ ✕ ✕ tanh tanh + x t h t C t h t

Slide 18

Slide 18 text

READ ROMAJI ONLY Memory Game J た Ü ロ駅R ऌ G E N 止

Slide 19

Slide 19 text

End-to-end people detection in crowded scenes Russell Stewart, Mykhaylo Andriluka ● Mechanical Turk ● GoogLeNet ● LSTM ● Hungarian loss algorithm

Slide 20

Slide 20 text

Understanding LSTM Networks http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Prof. Schmidhuber http://people.idsia.ch/~juergen/ LSTM in the browser with Synaptic.js http://synaptic.juancazala.com/#/dsr Read more