Decoupled Neural Interfaces using Synthetic Gradients

Decoupled Neural Interfaces using Synthetic Gradients

Deep Learning Reading Group presentation on the paper "Decoupled Neural Interfaces using Synthetic Gradients" by Max Jaderberg et. al.

F1d4e9c245333845329a45581e21a599?s=128

Tribhuvanesh Orekondy

September 05, 2016
Tweet

Transcript

  1. Decoupled Neural Interfaces Using Synthetic Gradients Max Jaderberg et. al.

    Google DeepMind Tribhuvanesh Orekondy MPI-INF D2 Deep Learning Reading Group 5-Sep-2016 A A B fA hA SB fB c MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) ction, ction, ed r FN i+1 Fi 1 (a) A
  2. Problem

  3. Problem

  4. Approach Produces approximated “synthetic” gradients

  5. Approach

  6. Decouple Neural Interfaces (DNI) A B fA hA SB fB

    c MB Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: (a) A A A!B A B A hA→ B ˆ A!B A!B SB ˆ A!B B c MA→B fi fi+1 fi+2 … … … … i i+1 (b) Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: Fi F (a)
  7. 1. Feed-forward Networks 2. Recurrent Neural Networks

  8. Feed-forward Neural Nets A A B fA hA SB fB

    c MB fi fi+1 fi+2 … … … … f f i i+1 (b) connection, cked onnection, locked ient error FN i+1 Fi 1 (a) A
  9. Feed-forward Neural Nets A A B A SB B c

    MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) FN i+1 Fi 1 (a) A
  10. Feed-forward Neural Nets fi+1 fi … … Mi+1 ˆ i

    hi fi Mi+1 ˆ i hi i hi fi+1 fi Mi+1 Mi+2 hi+1 ˆ i+1 fi+2 … … Mi+2 hi+1 hi+1 ˆ i+1 i+1 fi+1 fi Mi+1 fi+2 Mi+2 Update fi Update fi+1 & Mi+1 Update fi+2 & Mi+2
  11. Feed-forward Neural Nets

  12. Feed-forward Neural Nets

  13. Feed-forward Neural Nets DNI cDNI Bprop 3 4 5 6

    Layers DNI cDNI Bprop 3 4 5 6 Layers
  14. Feed-forward Neural Nets

  15. Bonus - “Complete Unlock” f1 fi+2 Mi+2 f2 f3 f4

    L I2 M2 I3 M3 I4 M4
  16. Feed-forward Neural Nets Update Decoupled Forwards and Update Dec DNI

    cDNI DNI Forwards and Update Decoupled cDNI cDNI DNI Update Decoupled Forwards and Update Decoupled NI cDNI cDNI DNI
  17. 1. Feed-forward Networks 2. Recurrent Neural Networks

  18. Recurrent Neural Nets

  19. Recurrent Neural Nets … … … … … … Lt

    Lt+1 Lt+2 ˆ t … Lt+3 Lt+3 ˆ t+6 … … … … Lt+4 Lt+5 Lt+6 … t+3 Update f ˆ t+3 ˆ t+3
  20. Recurrent Neural Nets Input a b c . Copy Target

    a b c . Input a b c 2 . Repeat Copy Target a b c a b c . Input a b c d e f Penn Treebank Target b c d e f
  21. Recurrent Neural Nets Repeat Copy Copy THIS IS NE

  22. Recurrent Neural Nets

  23. Bonus - “Multi-Network System”

  24. Remarks • Experiments over DNI hidden layer size • Deep

    models become polynomially/exponentially deeper? • Overhead of training models (in run-time) • Module = “Linear Transform + ReLU + BatchNorm”
 Why not other variations? • Theory - why does it work?
  25. References • Paper (https://arxiv.org/abs/1608.05343) • DeepMind article (https://deepmind.com/blog#decoupled- neural-interfaces-using-synthetic-gradients)
 •

    Figures are re-used from the article and paper