Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Backpropagation Cheet Sheet

Avatar for jojonki jojonki
December 28, 2019

Backpropagation Cheet Sheet

Avatar for jojonki

jojonki

December 28, 2019
Tweet

More Decks by jojonki

Other Decks in Technology

Transcript

  1. z = x + y + z x y ∂L

    ∂z ∂L ∂y = ∂L ∂z ∂z ∂y = ∂L ∂z ⋅ 1 ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z ⋅ 1
  2. z = x × y × z x y ∂L

    ∂z ∂L ∂y = ∂L ∂z ∂z ∂y = ∂L ∂z x ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z y
  3. y = 1/x / y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = − ∂L ∂y 1 x2 x
  4. y = log(x) log y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = ∂L ∂y 1 x x
  5. y = exp(x) exp y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = ∂L ∂y exp(x) x
  6. z = 1 1 + exp(−x) Sigmoid z ∂L ∂z

    ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z z(1 − z) x Sigmoid { 1 1 + exp(−x)} ′ = {(1 + exp(−x))−1 }′ = − −exp(−x) (1 + exp(−x))2 = z2 1 − z z = z(1 − z)
  7. Repeat / Sum Repeat D D N ∂L ∂z1 z

    x ∂L ∂zN ∂L ∂z2 N ∑ 1 ∂L ∂zi D N Sum x D z ∂L ∂z ∂L ∂z ∂L ∂z ∂L ∂z ∂L ∂z
  8. y = xW y = (x1 , x2) ( w11

    , w12 , w13 w21 , w22 , w23 ) = (x1 w11 + x2 w21 , x1 w12 + x2 w22 , x1 w13 + x2 w23) × x1 w11 × x2 w21 + y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 w11 ∂L ∂y1 x1 ∂L ∂y1 w21 ∂L ∂y1 x2 ∂L ∂xi = ∑ j ∂L yj wij = ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) wi1 wi2 wi3 ∂L ∂x = ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) w11 , w21 w12 , w22 w13 , w23 = ∂L ∂y wT 1/3
  9. y = xW y = (x1 , x2) ( w11

    , w12 , w13 w21 , w22 , w23 ) = (x1 w11 + x2 w21 , x1 w12 + x2 w22 , x1 w13 + x2 w23) × x1 w11 × x2 w21 + y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 w11 ∂L ∂y1 x1 ∂L ∂y1 w21 ∂L ∂y1 x2 ∂L ∂w11 , ∂L ∂w12 , ∂L ∂w13 ∂L ∂w21 , ∂L ∂w22 , ∂L ∂w23 = ( x1 x2 ) ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) ∂L ∂w = xT ∂L ∂y 2/3
  10. y = xW × y x W ∂L ∂z ∂L

    ∂w = ∂L ∂y ∂y ∂w = xT ∂L ∂y ∂L ∂x = ∂L ∂y ∂y ∂x = ∂L ∂y wT (N, H) (D, H) (N, D) 3/3
  11. exp Softmax with Loss a1 exp(a1 ) + / x

    log x + x −∑ i ti log yi = L S = ∑ i exp(ai ) 1 S y1 = exp(a1 ) S log y1 t1 exp a2 exp(a2 ) x log x y2 = exp(a2 ) S log y2 exp a2 exp(a3 ) x log x y3 = exp(a3 ) S log y3 t2 t3 ∑ i ti log yi −1 1 −1 −1 −1 −1 −t1 −t2 −t3 − t1 y1 − t2 y2 − t3 y3 −St1 −St2 −St3 t1 + t2 + t3 S = 1 S 1 S 1 S 1 S exp(a1 ) S − t1 = y1 − t1 − t1 exp(a1 ) − t2 exp(a2 ) − t3 exp(a3 ) exp(a2 ) S − t2 = y2 − t2 exp(a3 ) S − te = y3 − t3