Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Long Term Short Term Memory

Leszek Rybicki
September 11, 2015

Long Term Short Term Memory

Leszek Rybicki

September 11, 2015
Tweet

More Decks by Leszek Rybicki

Other Decks in Science

Transcript

  1. Outline 1. Memory Game 2. Recurrent Neural Networks 3. Memory

    Game 4. Long-Term Short-Term Memory Networks 5. Memory Game 6. ...one more thing
  2. DAY 2: -3 tasks done, +2 new DAY 3: +3

    new tasks, -1 done DAY 4: did -3 tasks, +4 new tasks DAY 5: +1 new task, did -5! Memory Game DAY 1: +5 new tasks What is the name of the intern? Imagine that you are an intern
  3. Feedforward Network: XOR 0 xor 0 = 0 1 xor

    0 = 1 0 xor 1 = 1 1 xor 1 = 0 0.6 0.4 0.6 0.4 0.5 -0.5 -1 -1
  4. Feedforward Network in a box x output= (W 2 ·

    hidden) W 1 W 2 hidden= (W 1 · x + [-1,-1]) 0.6 0.4 0.6 0.4 0.5 -0.5 -1 -1
  5. Recurrent Network: XOR [0, 0] = [?, 0] [1, 0]

    = [?, 1] [0, 1] = [?, 1] [1, 1] = [?, 0] t-1
  6. Trouble with recurrent networks - short attention span - not

    good with distractions - unstable when training - training takes a long time - short attention span
  7. Jürgen Schmidhuber “YOU AGAIN” To avoid long time lag problems

    of gradient-based approaches we may simply randomly initialize all network weights until the resulting net happens to classify all training sequences correctly. In fact, recently we discovered that simple weight guessing solves many of the problems faster than the algorithms proposed therein. This does not mean that weight guessing is a good algorithm. It just means that the problems are very simple. -- Jürgen Schmidhuber LONG SHORT-TERM MEMORY Neural Computation 9(8):1735{1780, 1997}
  8. LSTM Unit ✕ tanh ✕ ✕ ta + ✕ ✕

    ✕ tanh tanh + x t h t h t-1 x t+1 C t h t-1 C t-1 h t
  9. Symbols ✕ neural network layer pointwise operation + concatenate two

    vectors clone vector ✕ [x,y,z] [a,b,c] [ax,by,cz] + [x,y,z] [a,b,c] [a+x, b+y, c+z] tanh
  10. Cell state ✕ ✕ tanh tanh x t h t

    C t h t-1 C t-1 h t + ✕
  11. Forget gate ✕ ✕ tanh tanh + x t h

    t C t h t-1 C t-1 ✕ h t ✕ x,y,z a,b,c ax,by,cz
  12. Update gate ✕ tanh x t h t C t

    h t-1 C t-1 ✕ ✕ tanh + h t + x,y,z a,b,c a+x, b+y, c+z
  13. Select gate x t h t C t h t-1

    C t-1 ✕ ✕ tanh + ✕ tanh h t tanh
  14. End-to-end people detection in crowded scenes Russell Stewart, Mykhaylo Andriluka

    • Mechanical Turk • GoogLeNet • LSTM • Hungarian loss algorithm