$30 off During Our Annual Pro Sale. View Details »

Backpropagation Cheet Sheet

jojonki
December 28, 2019

Backpropagation Cheet Sheet

jojonki

December 28, 2019
Tweet

More Decks by jojonki

Other Decks in Technology

Transcript

  1. z = x + y + z x y ∂L

    ∂z ∂L ∂y = ∂L ∂z ∂z ∂y = ∂L ∂z ⋅ 1 ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z ⋅ 1
  2. z = x × y × z x y ∂L

    ∂z ∂L ∂y = ∂L ∂z ∂z ∂y = ∂L ∂z x ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z y
  3. y = 1/x / y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = − ∂L ∂y 1 x2 x
  4. y = log(x) log y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = ∂L ∂y 1 x x
  5. y = exp(x) exp y ∂L ∂y ∂L ∂x =

    ∂L ∂y ∂y ∂x = ∂L ∂y exp(x) x
  6. z = 1 1 + exp(−x) Sigmoid z ∂L ∂z

    ∂L ∂x = ∂L ∂z ∂z ∂x = ∂L ∂z z(1 − z) x Sigmoid { 1 1 + exp(−x)} ′ = {(1 + exp(−x))−1 }′ = − −exp(−x) (1 + exp(−x))2 = z2 1 − z z = z(1 − z)
  7. Repeat / Sum Repeat D D N ∂L ∂z1 z

    x ∂L ∂zN ∂L ∂z2 N ∑ 1 ∂L ∂zi D N Sum x D z ∂L ∂z ∂L ∂z ∂L ∂z ∂L ∂z ∂L ∂z
  8. y = xW y = (x1 , x2) ( w11

    , w12 , w13 w21 , w22 , w23 ) = (x1 w11 + x2 w21 , x1 w12 + x2 w22 , x1 w13 + x2 w23) × x1 w11 × x2 w21 + y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 w11 ∂L ∂y1 x1 ∂L ∂y1 w21 ∂L ∂y1 x2 ∂L ∂xi = ∑ j ∂L yj wij = ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) wi1 wi2 wi3 ∂L ∂x = ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) w11 , w21 w12 , w22 w13 , w23 = ∂L ∂y wT 1/3
  9. y = xW y = (x1 , x2) ( w11

    , w12 , w13 w21 , w22 , w23 ) = (x1 w11 + x2 w21 , x1 w12 + x2 w22 , x1 w13 + x2 w23) × x1 w11 × x2 w21 + y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 ∂L ∂y1 w11 ∂L ∂y1 x1 ∂L ∂y1 w21 ∂L ∂y1 x2 ∂L ∂w11 , ∂L ∂w12 , ∂L ∂w13 ∂L ∂w21 , ∂L ∂w22 , ∂L ∂w23 = ( x1 x2 ) ( ∂L ∂y1 , ∂L ∂y2 , ∂L ∂y3 ) ∂L ∂w = xT ∂L ∂y 2/3
  10. y = xW × y x W ∂L ∂z ∂L

    ∂w = ∂L ∂y ∂y ∂w = xT ∂L ∂y ∂L ∂x = ∂L ∂y ∂y ∂x = ∂L ∂y wT (N, H) (D, H) (N, D) 3/3
  11. exp Softmax with Loss a1 exp(a1 ) + / x

    log x + x −∑ i ti log yi = L S = ∑ i exp(ai ) 1 S y1 = exp(a1 ) S log y1 t1 exp a2 exp(a2 ) x log x y2 = exp(a2 ) S log y2 exp a2 exp(a3 ) x log x y3 = exp(a3 ) S log y3 t2 t3 ∑ i ti log yi −1 1 −1 −1 −1 −1 −t1 −t2 −t3 − t1 y1 − t2 y2 − t3 y3 −St1 −St2 −St3 t1 + t2 + t3 S = 1 S 1 S 1 S 1 S exp(a1 ) S − t1 = y1 − t1 − t1 exp(a1 ) − t2 exp(a2 ) − t3 exp(a3 ) exp(a2 ) S − t2 = y2 − t2 exp(a3 ) S − te = y3 − t3