Decoupled Neural Interfaces using Synthetic Gradients

Decoupled Neural Interfaces Using Synthetic Gradients Max Jaderberg et. al.
Google DeepMind Tribhuvanesh Orekondy MPI-INF D2 Deep Learning Reading Group 5-Sep-2016 A A B fA hA SB fB c MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) ction, ction, ed r FN i+1 Fi 1 (a) A

Problem

Approach Produces approximated “synthetic” gradients

Approach

Decouple Neural Interfaces (DNI) A B fA hA SB fB
c MB Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: (a) A A A!B A B A hA→ B ˆ A!B A!B SB ˆ A!B B c MA→B fi fi+1 fi+2 … … … … i i+1 (b) Forward connection, update locked Forward connection, not update locked Error gradient Synthetic error gradient Legend: Fi F (a)

1. Feed-forward Networks 2. Recurrent Neural Networks

Feed-forward Neural Nets A A B fA hA SB fB
c MB fi fi+1 fi+2 … … … … f f i i+1 (b) connection, cked onnection, locked ient error FN i+1 Fi 1 (a) A

Feed-forward Neural Nets A A B A SB B c
MB fi fi+1 fi+2 … … … … fi fi+1 fi+2 … … … … Mi+1 i ˆ i i+1 (b) (c) FN i+1 Fi 1 (a) A

Feed-forward Neural Nets fi+1 fi … … Mi+1 ˆ i
hi fi Mi+1 ˆ i hi i hi fi+1 fi Mi+1 Mi+2 hi+1 ˆ i+1 fi+2 … … Mi+2 hi+1 hi+1 ˆ i+1 i+1 fi+1 fi Mi+1 fi+2 Mi+2 Update fi Update fi+1 & Mi+1 Update fi+2 & Mi+2

Feed-forward Neural Nets

Feed-forward Neural Nets DNI cDNI Bprop 3 4 5 6
Layers DNI cDNI Bprop 3 4 5 6 Layers

Feed-forward Neural Nets

Bonus - “Complete Unlock” f1 fi+2 Mi+2 f2 f3 f4
L I2 M2 I3 M3 I4 M4

Feed-forward Neural Nets Update Decoupled Forwards and Update Dec DNI
cDNI DNI Forwards and Update Decoupled cDNI cDNI DNI Update Decoupled Forwards and Update Decoupled NI cDNI cDNI DNI

1. Feed-forward Networks 2. Recurrent Neural Networks

Recurrent Neural Nets

Recurrent Neural Nets … … … … … … Lt
Lt+1 Lt+2 ˆ t … Lt+3 Lt+3 ˆ t+6 … … … … Lt+4 Lt+5 Lt+6 … t+3 Update f ˆ t+3 ˆ t+3

Recurrent Neural Nets Input a b c . Copy Target
a b c . Input a b c 2 . Repeat Copy Target a b c a b c . Input a b c d e f Penn Treebank Target b c d e f

Recurrent Neural Nets Repeat Copy Copy THIS IS NE

Recurrent Neural Nets

Bonus - “Multi-Network System”

Remarks • Experiments over DNI hidden layer size • Deep
models become polynomially/exponentially deeper? • Overhead of training models (in run-time) • Module = “Linear Transform + ReLU + BatchNorm”  Why not other variations? • Theory - why does it work?

References • Paper (https://arxiv.org/abs/1608.05343) • DeepMind article (https://deepmind.com/blog#decoupled- neural-interfaces-using-synthetic-gradients)  •
Figures are re-used from the article and paper

Decoupled Neural Interfaces using Synthetic Gra...

Decoupled Neural Interfaces using Synthetic Gradients

Tribhuvanesh Orekondy

More Decks by Tribhuvanesh Orekondy

Other Decks in Research

Featured

Transcript