Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Decoupled Neural Interfaces using Synthetic Gradients

Decoupled Neural Interfaces using Synthetic Gradients

Deep Learning Reading Group presentation on the paper "Decoupled Neural Interfaces using Synthetic Gradients" by Max Jaderberg et. al.

Tribhuvanesh Orekondy

September 05, 2016
Tweet

More Decks by Tribhuvanesh Orekondy

Other Decks in Research

Transcript

  1. Decoupled Neural Interfaces Using Synthetic Gradients Max Jaderberg et. al.

    Google DeepMind Tribhuvanesh Orekondy MPI-INF D2 Deep Learning Reading Group 5-Sep-2016 A A B fA hA SB fB c MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) ction, ction, ed r FN i+1 Fi 1 (a) A
  2. Decouple Neural Interfaces (DNI) A B fA hA SB fB

    c MB Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: (a) A A A!B A B A hA→ B ˆ A!B A!B SB ˆ A!B B c MA→B fi fi+1 fi+2 … … … … i i+1 (b) Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: Fi F (a)
  3. Feed-forward Neural Nets A A B fA hA SB fB

    c MB fi fi+1 fi+2 … … … … f f i i+1 (b) connection, cked onnection, locked ient error FN i+1 Fi 1 (a) A
  4. Feed-forward Neural Nets A A B A SB B c

    MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) FN i+1 Fi 1 (a) A
  5. Feed-forward Neural Nets fi+1 fi … … Mi+1 ˆ i

    hi fi Mi+1 ˆ i hi i hi fi+1 fi Mi+1 Mi+2 hi+1 ˆ i+1 fi+2 … … Mi+2 hi+1 hi+1 ˆ i+1 i+1 fi+1 fi Mi+1 fi+2 Mi+2 Update fi Update fi+1 & Mi+1 Update fi+2 & Mi+2
  6. Feed-forward Neural Nets DNI cDNI Bprop 3 4 5 6

    Layers DNI cDNI Bprop 3 4 5 6 Layers
  7. Feed-forward Neural Nets Update Decoupled Forwards and Update Dec DNI

    cDNI DNI Forwards and Update Decoupled cDNI cDNI DNI Update Decoupled Forwards and Update Decoupled NI cDNI cDNI DNI
  8. Recurrent Neural Nets … … … … … … Lt

    Lt+1 Lt+2 ˆ t … Lt+3 Lt+3 ˆ t+6 … … … … Lt+4 Lt+5 Lt+6 … t+3 Update f ˆ t+3 ˆ t+3
  9. Recurrent Neural Nets Input a b c . Copy Target

    a b c . Input a b c 2 . Repeat Copy Target a b c a b c . Input a b c d e f Penn Treebank Target b c d e f
  10. Remarks • Experiments over DNI hidden layer size • Deep

    models become polynomially/exponentially deeper? • Overhead of training models (in run-time) • Module = “Linear Transform + ReLU + BatchNorm”
 Why not other variations? • Theory - why does it work?