Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ゼロつく勉強会_誤差逆伝播法

Avatar for higa-pan higa-pan
February 18, 2024

 ゼロつく勉強会_誤差逆伝播法

社内での勉強会に向けて作成した資料です。
誤差逆伝播法の概要とその動きを計算グラフも用いて図示しています。

Avatar for higa-pan

higa-pan

February 18, 2024
Tweet

Other Decks in Technology

Transcript

  1. ໨࣍ w ޡࠩٯ఻೻๏ͱ͸ w ͳͥޯ഑ 㲈ඍ෼ ͸ඞཁͳͷ͔ w χϡʔϥϧωοτϫʔΫ͸ؔ਺Ͱ͋Δ w

    ࠷దͳύϥϝʔλΛ؆୯ʹٻΊΒΕͳ͍ ղੳతʹղ͚ͳ͍  w ඍ෼Λఆٛ௨ΓʹٻΊΔͱ஗͍ w ܭࢉάϥϑ w ࿈࠯཯ʢ߹੒ؔ਺ͷඍ෼ͷެࣜʣ w ޯ഑ΛٻΊͯΈΔ
  2. ޡࠩٯ఻೻๏ͱ͸ w ͍ΘΏΔχϡʔϥϧωοτϫʔΫͷ 
 ޡࠩؔ਺ͷޯ഑Λޮ཰Α͘ٻΊΔςΫχοΫ w ૉ௚ʹશύϥϝʔλͷඍ෼Λ਺஋ܭࢉ͢ΔͷͰ͸ͳ͘ɺ 
 ࿈࠯཯ ߹੒ؔ਺ͷඍ෼ެࣜ

    Λ༻͍Δ͜ͱͰߴ଎Խ w ܭࢉάϥϑΛಋೖ͢Δ͜ͱͰࢹ֮తʹΘ͔Γ΍͘͢ɺ 
 ϓϩάϥϛϯά΋͠΍͘͢ͳΔ QZUPSDI UFOTPS fl PXͰ΋࢖ΘΕ͍ͯΔ
  3. ޡࠩؔ਺ͷ஋Λʹ͚͍ۙͮͨ ޡࠩؔ਺ͷ஋͕খ͍͞ਖ਼ղσʔλʹ͍ۙͨΊɺχϡʔϥϧωοτϫʔΫ͕ظ଴ͨ͠ܭࢉΛ͍ͯ͠ΔͱΈͳͤΔ x3 x2 x1 j2 j1 y wj1x1 wj1x2

    wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 χϡʔϥϧωοτϫʔΫ E(y, t) = 1 2 (y − t)2 ޡࠩؔ਺ ೖྗ d Λݩʹɺதؒ૚ Λ௨ͬͯɺग़ྗ Λग़͢ωοτϫʔΫ x1 x3 j1 , j2 y ਖ਼ղσʔλ ͔ΒͲΕ͘Β͍ζϨ͍ͯΔ͔Λࢉग़ t
  4. χϡʔϥϧωοτϫʔΫ͸ؔ਺ ԼͷωοτϫʔΫ͸ຊͷ਺ࣜͰදͤΔ ͷ׆ੑԽؔ਺Λ  ͷ׆ੑԽؔ਺Λ ͱද͢ j1 , j2 h

    y σ 1 2 (σ(wyj1 * h(x1 * wj1x1 + x2 * wj1x2 + x3 * wj1x3 ) + wyj2 * h(x1 * wj2x1 + x2 * wj2x2 + x3 * wj2x3 )) − t)2 x3 x2 x1 j2 j1 y wj1x1 wj1x2 wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 χϡʔϥϧωοτϫʔΫ E(y, t) = 1 2 (y − t)2 ޡࠩؔ਺ ೖྗ d Λݩʹɺதؒ૚ Λ௨ͬͯɺग़ྗ Λग़͢ωοτϫʔΫ x1 x3 j1 , j2 y ਖ਼ղσʔλ ͔ΒͲΕ͘Β͍ζϨ͍ͯΔ͔Λࢉग़ t
  5. ύϥϝʔλΛखܰʹٻΊΒΕͳ͍ ؆୯ͳؔ਺ͳΒɺެࣜάϥϑΛॻ͍ͯ ޡࠩؔ਺ͷ஋͕Ұ൪খ͍͞஋ ࠷খ஋ ΛٻΊΒΕͨ E(x, t) = z =

    (x1 − t)2 + (x2 − t)2 ͷͱ͖ɺ࠷খ஋ΛͱΔ x1 = x1 = t χϡʔϥϧωοτϫʔΫ͸ܭࢉ͕ࠔ೉   ͷ࠷খ஋ΛͱΔ ͱ͸ʁ ʢ׆ੑԽؔ਺΋ؒʹڬ·͍ͬͯΔͨΊΑΓࠔ೉ʣ 1 2 (σ(wyj1 * h(x1 * wj1x1 + x2 * wj1x2 + x3 * wj1x3 )+ wyj2 * h(x1 * wj2x1 + x2 * wj2x2 + x3 * wj2x3 )) − t)2 w x3 x2 x1 j2 j1 y wj1x1 wj1x2 wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 IUUQTXXXHFPHFCSBPSHE MBOHKBɹΛ༻͍ͯ࡞੒
  6. ޯ഑Λ༻͍ͯগͣͭ͠ޡࠩΛʹ͚ۙͮΔ ޡࠩؔ਺ Λඍ෼ͯ͠ɺ஋͕খ͘͞ͳΔํ޲ ޯ഑ ΛٻΊɺͦ͜ʹ޲͔ͬͯ Λগͣͭ͠มԽͤ͞Δ ˠ͍ͣΕ͸ޡ͕ࠩʹۙͮ͘ ˠ࠷খ஋Λ୳ͤ͹ྑ͍ E w

    1 2 (σ(wyj1 * h(x1 * wj1x1 + x2 * wj1x2 + x3 * wj1x3 ) + wyj2 * h(x1 * wj2x1 + x2 * wj2x2 + x3 * wj2x3 )) − t)2 ௚ઢͷ܏͖ޯ഑ ޯ഑͕ʹͳΔํ޲΁ ΛมԽ͍ͤͯ͘͞ w ࠷খԽ͍ͤͨؔ͞਺ ࠷খ஋ w ࠷খ஋Ͱ͸ޯ഑͸ʹͳΔ ˞࠷େ஋ ఀཹ఺Ͱ΋ޯ഑͸ͱͳΔ
  7. ޯ഑ΛٻΊΔํ๏Šఆٛ௨Ê ޯ഑ΛٻΊΔܭࢉࣜ df dw = f(w + h) − f(w)

    h I͸ඇৗʹখ͍͞ F? ͱ͔F?ͱ͔ x3 x2 x1 j2 j1 y wj1x1 wj1x2 wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 E(y, t) = 1 2 (y − t)2  Λม͑ͣʹܭࢉ f(wj1x1 ) wj1x1 df dwj1x1 = f(wj1x1 + h) − f(wj1x1 ) h = E′  − E h ྫ͑͹ x3 x2 x1 j2 j1 y′  wj1x1 + h wj1x2 wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 E′  (y, t) = 1 2 (y′  − t)2  ʹ ଍ͯ͠ܭࢉ f(wj1x1 + h) wj1x1 h
  8. ఆٛ௨ΓͰޯ഑ΛٻΊΔࡍͷ೉఺ x3 x2 x1 j2 j1 y wj1x1 wj1x2 wj2x1

    wj2x2 wj1x3 wj2x3 wyj2 wyj1 E(y, t) = 1 2 (y − t)2  Λม͑ͣʹܭࢉ f(wj1x1 ) wj1x1 df dwj1x1 = f(wj1x1 + h) − f(wj1x1 ) h = E′  − E h ͭͷύϥϝʔλͷޯ഑ΛٻΊΔͷʹ࠷ॳ͔ΒܭࢉΛ΍Γ௚͢ඞཁ͕͋Δɻ ఆٛ௨Γճ΋ʢॱ఻೻Λʣܭࢉ͠௚͢ ΑΓྑ͍ਫ਼౓ த৺ࠩ෼ ճ΋ʢॱ఻೻ͷʣܭࢉΛ΍Γ௚͢ඞཁ͕͋Δɻˠύϥϝʔλͷ਺ ԋࢉճ਺ͱ͢Ε͹ W O(W2) x3 x2 x1 j2 j1 y′  wj1x1 + h wj1x2 wj2x1 wj2x2 wj1x3 wj2x3 wyj2 wyj1 E′  (y, t) = 1 2 (y′  − t)2  ʹ ଍ͯ͠ܭࢉ f(wj1x1 + h) wj1x1 h
  9. ܭࢉάϥϑᶅ Ϧϯΰݸͷ஋ஈ͕=ˠ=ͱͳͬͨࡍɺ Έ͔Μͷܭࢉ෦෼ɺফඅ੫ͷܭࢉ෦෼ʹ͸Өڹ͸ͳ͍ Έ͔Μ = ݸ Έ͔Μͷݸ਺ ফඅ੫  =

    = Ϧϯΰ = ݸ Ϧϯΰͷݸ਺ = = ˡܭࢉͷӨڹ͕ہॴతͩͱΘ͔Δ ΓΜ͝ͷ߹ܭֹۚ͑͞Θ͔Ε͹0, ৯ࡐܭࢉϩδοΫ͕มΘͬͯ΋0, ৯ࡐͷ߹ܭֹۚ͑͞Θ͔Ε͹0, ৯ࡐܭࢉϩδοΫ͕มΘͬͯ΋0,
  10. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶄ ফඅ੫ z t * z E ؆୯Խ ߹੒ؔ਺ͱΈͳ͢ t

    Ϧϯΰ x y Ϧϯΰͷݸ਺ ফඅ੫ z x * y t * z E ΓΜ͝ͷ߹ܭֹۚ͑͞Θ͔Ε͹0, ΓΜ͝ͷ෦෼Λ ͱ·ͱΊΔ t
  11. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶅ ফඅ੫ z t * z E t dE dt

    = d(t * z) dt = z * d(t) dt = z * 1 = z dE dz = d(t * z) dz = t * d(z) dz = t * 1 = t ফඅ੫ z t * z E t 1 z t ͕૿͑Δͱ ͚ͩ ͕૿͑  ͕૿͑Δͱ ͚ͩ ͕૿Δ t z E z t E ͷมԽྔʹରͯ͠ ͷมԽྔΛٻΊ͍ͨ ˠ ΛٻΊΔ͜ͱʹ૬౰ t E dE dt
  12. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶆ d(x * y) dx = y * d(x) dx

    = y * 1 = y ͕૿͑Δͱ ͚ͩ૿͑ɺ ͕૿͑Δͱ ͚ͩ૿͑Δ x y y x Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ ࣍ʹΓΜ͝ͷ෦෼΋ಉ༷ʹߟ͑ͯΈΔ ͱ ͦΕͧΕͷมԽྔʹରͯ͠ ͷมԽྔΛٻΊ͍ͨ ˠ ɺ ΛٻΊΔ͜ͱʹ૬౰ x y x * y d(x * y) dx d(x * y) dy d(x * y) dy = x * d(y) dy = x * 1 = x 1 Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ y x
  13. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶇ 1 Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ y

    x ফඅ੫ z t * z E t 1 z t ͭͷܭࢉάϥϑΛٯํ޲ʹܨ͛Δʹ͸ʁˠੵΛऔΕ͹ྑ͍ʢ࿈࠯཯ɺ߹੒ؔ਺ͷඍ෼ͷެࣜʣ Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ 1 * z * y 1 * z * x ফඅ੫ z t * z E 1 1 * z 1 * t
  14. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶈ ͷͦͷઌ͔Βٯํ޲ʹ఻೻ͨ͠৔߹ɺͦͷ··ޙΖʹ఻೻͢Δ ͦͷઌͷܭࢉํ๏Λ໰Θͳ͍ E Ϧϯΰ x x * y y

    Ϧϯΰͷݸ਺ 100 * z * y 100 * z * x ফඅ੫ z t * z E 100 100 * z 100 * t Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ δ * z * y δ * z * x ফඅ੫ z t * z E δ δ * z δ * t
  15. ܭࢉάϥϑͰٯ఻೻Λߟ͑Δᶉ ࿈࠯཯ͱͷରԠ͸ҎԼͷΑ͏ʹͳΔ Ϧϯΰ x x * y y Ϧϯΰͷݸ਺ δ

    * z * y δ * z * x ফඅ੫ z t * z E δ δ * z δ * t dE dx = dE dt dt dx = z * y dE dt = d(t * z) dt = z * d(t) dt = z * 1 = z dt dx = d(x * y) dx = y * d(x) dx = y * 1 = y
  16. ܭࢉάϥϑ·ͱΊ w ܭࢉάϥϑͱ͸ೖྗ஋ͱԋࢉɺͦͯ͠ԋࢉ݁ՌΛάϥϑͰදݱͨ͠΋ͷ w શମͷܭࢉΛہॴతͳԋࢉͷ·ͱΊ߹Θͤͱଊ͑Δ͜ͱ͕Ͱ͖ΔʢܭࢉͷہॴԽʣ w ࠓ·Ͱͷܭࢉ݁ՌΛҾ͖ܧ͙͜ͱΛදݱͰ͖ɺ࿈࠯཯ͱͷ૬ੑ͕Α͍ޡࠩٯ఻೻๏Λ࣮૷͠΍͍͢ Ϧϯΰ x x

    * y y Ϧϯΰͷݸ਺ δ * z * y δ * z * x ফඅ੫ z t * z E δ δ * z δ * t dE dx = dE dt dt dx = z * y dE dt = d(t * z) dt = z * d(t) dt = z * 1 = z dt dx = d(x * y) dx = y * d(x) dx = y * 1 = y
  17. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱॱ఻೻ᶃ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2
  18. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱॱ఻೻ᶄ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2
  19. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱॱ఻೻ᶅ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2
  20. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱॱ఻೻ᶆ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2
  21. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶃ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ δ * σ′  σ′  = ∂σ (∂(wyj1 j1 ) + ∂(wy j2j2 ))
  22. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶄ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ δ * σ′  ͸ͦͷ··ޙΖʹ఻೻ͤ͞Δʢ ʣ 
 ͸΋͏ҰํͷೖྗͱͷੵΛ఻೻ͤ͞Δ δ d(x + y) dx = δ * 1 δ d(xy) dx = δ * y
  23. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶅ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ δ * σ′  * j2 δ * σ′  δ * σ′  * j1 δ * σ′  * wyj1 δ * σ′  * wyj2 ͸ͦͷ··ޙΖʹ఻೻ͤ͞Δʢ ʣ 
 ͸΋͏ҰํͷೖྗͱͷੵΛ఻೻ͤ͞Δ δ d(x + y) dx = δ * 1 δ d(xy) dx = δ * y
  24. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶆ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ h′  = ∂h (∂(x1 * wj1x1 ) + ∂(x2 * wj1x2 ) + ∂(x3 * wj1x3 )) δ * σ′  h′  = ∂h (∂(x1 * wj2x1 ) + ∂(x2 * wj2x2 ) + ∂(x3 * wj2x3 )) δ * σ′  * wyj1 * h′  δ * σ′  * wyj2 * h′  δ * σ′  * j2 δ * σ′  * j1 ͸ͦͷ··ޙΖʹ఻೻ͤ͞Δʢ ʣ 
 ͸΋͏ҰํͷೖྗͱͷੵΛ఻೻ͤ͞Δ δ d(x + y) dx = δ * 1 δ d(xy) dx = δ * y
  25. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶇ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ δ * σ′  δ * σ′  * j2 δ * σ′  * j1 ͸ͦͷ··ޙΖʹ఻೻ͤ͞Δʢ ʣ 
 ͸΋͏ҰํͷೖྗͱͷੵΛ఻೻ͤ͞Δ δ d(x + y) dx = δ * 1 δ d(xy) dx = δ * y δ * σ′  * wyj1 * h′  δ * σ′  * wyj2 * h′ 
  26. χϡʔϥϧωοτϫʔΫΛܭࢉάϥϑͰදݱٯ఻೻ᶈ x3 x2 x1 j2 j1 y wyj1 y σ

    wj1x3 wj1x2 x1 * wj1x1 x2 * wj1x2 x3 * wj1x3 x1 x2 x3 wj1x1 h j1 wj2x3 wj2x2 x1 * wj2x1 x2 * wj2x2 x3 * wj3x3 wj2x1 h j2 wyj2 δ δ * σ′  * wyj1 * h′  * x1 δ * σ′  * wyj1 * h′  * x2 δ * σ′  * wyj1 * h′  * x3 δ * σ′  * wyj2 * h′  * x2 δ * σ′  * wyj2 * h′  * x1 δ * σ′  * wyj2 * h′  * x3 δ * σ′  δ * σ′  * j2 δ * σ′  * j1 ͸ͦͷ··ޙΖʹ఻೻ͤ͞Δʢ ʣ 
 ͸΋͏ҰํͷೖྗͱͷੵΛ఻೻ͤ͞Δ δ d(x + y) dx = δ * 1 δ d(xy) dx = δ * y δ * σ′  * wyj1 * h′  δ * σ′  * wyj2 * h′ 
  27. ܭࢉ଎౓ͷࠩ def numerical_gradient(f, x): h = 1e-4 # 0.0001 grad

    = np.zeros_like(x) it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) while not it.finished: idx = it.multi_index tmp_val = x[idx] x[idx] = tmp_val + h fxh1 = f(x) # f(x+h) x[idx] = tmp_val - h fxh2 = f(x) # f(x-h) grad[idx] = (fxh1 - fxh2) / (2*h) x[idx] = tmp_val # ஋Λݩʹ໭͢ it.iternext() return grad def gradient(self, x, t): # forward self.loss(x, t) dout = 0 dout = self.lastlayer.backward(dout) layers = list(self.layers.values()) layers.reverse() for layer in layers: dout = layer.backward(dout) grads = {} grads["W1"] = self.layers["Affine1"].dW grads["b1"] = self.layers["Affine1"].db grads["W2"] = self.layers["Affine2"].dW grads["b2"] = self.layers["Affine2"].db return grads த৺ࠩ෼Ͱޯ഑Λܭࢉ ޡࠩٯ఻೻๏Ͱޯ഑Λܭࢉ [numerical_gradient] done in 5551.438342809677 s [backprop_gradient] done in 0.7326231002807617 s 7577ഒʂʂ ʼ
  28. ϓϩάϥϜҰཡ import time from contextlib import contextmanager from tqdm import

    tqdm_notebook as tqdm @contextmanager def timer(name): t0 = time.time() yield print(f"[{name}] done in {time.time() - t0} s") (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True) network = TowLayerNet(input_size=784, hidden_size=50, output_size=10) train_size = x_train.shape[0] iter_num = 100 batch_size = 100 lr = 0.01 train_loss = [] train_acc = [] test_acc = [] iter_per_epoch = max(train_size / batch_size, 1) with timer("numerical_gradient"): for i in tqdm(range(iter_num)): batch_mask = np.random.choice(train_size, batch_size) x_batch = x_train[batch_mask] t_batch = t_train[batch_mask] grad = network.numerical_gradient(x_batch, t_batch) for key in network.params.keys(): network.params[key] -= lr * grad[key] loss = network.loss_value train_loss.append(loss) if i % iter_per_epoch == 0: train_acc.append(network.accuracy(x_train, t_train)) test_acc.append(network.accuracy(x_test, t_test)) print(f"{train_acc[-1]=}, {test_acc[-1]=}") train_loss = [] train_acc = [] test_acc = [] with timer("backprop_gradient"): for i in tqdm(range(iter_num)): batch_mask = np.random.choice(train_size, batch_size) x_batch = x_train[batch_mask] t_batch = t_train[batch_mask] grad = network.gradient(x_batch, t_batch) for key in network.params.keys(): network.params[key] -= lr * grad[key] loss = network.loss_value train_loss.append(loss) if i % iter_per_epoch == 0: train_acc.append(network.accuracy(x_train, t_train)) test_acc.append(network.accuracy(x_test, t_test)) print(f"{train_acc[-1]=}, {test_acc[-1]=}") IUUQTHJUIVCDPNPSFJMMZKBQBOEFFQMFBSOJOHGSPNTDSBUDIUSFFNBTUFS ΑΓҾ༻
  29. ϓϩάϥϜҰཡ def numerical_gradient(f, x): h = 1e-4 # 0.0001 grad

    = np.zeros_like(x) it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) while not it.finished: idx = it.multi_index tmp_val = x[idx] # print("----in numerical gradient ----") # print(f"{x = }") x[idx] = tmp_val + h # print(f"{x = }") fxh1 = f(x) # f(x+h) # print(f"{fxh1 = }") x[idx] = tmp_val - h # print(f"{x = }") fxh2 = f(x) # f(x-h) # print(f"{fxh2 = }") grad[idx] = (fxh1 - fxh2) / (2*h) x[idx] = tmp_val # ஋Λݩʹ໭͢ it.iternext() # print("----out numerical gradient ----") return grad def softmax(x): x = x - np.max(x, axis=-1, keepdims=True) # Φʔόʔϑϩʔରࡦ return np.exp(x) / np.sum(np.exp(x), axis=-1, keepdims=True) def cross_entropy_error(y, t): if y.ndim == 1: t = t.reshape(1, t.size) y = y.reshape(1, y.size) # ڭࢣσʔλ͕one-hot-vectorͷ৔߹ɺਖ਼ղϥϕϧͷΠϯσοΫεʹม׵ if t.size == y.size: t = t.argmax(axis=1) batch_size = y.shape[0] return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size class TowLayerNet(): def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01): self.params = {} self.params["W1"] = weight_init_std * np.random.randn(input_size, hidden_size) self.params["b1"] = np.zeros(hidden_size) self.params["W2"] = weight_init_std * np.random.randn(hidden_size, output_size) self.params["b2"] = np.zeros(output_size) self.layers = OrderedDict() self.layers["Affine1"] = Affine(self.params["W1"], self.params["b1"]) self.layers["Relu1"] = Relu() self.layers["Affine2"] = Affine(self.params["W2"], self.params["b2"]) self.lastlayer = SoftmaxWithLoss() self.loss_value = 0 def predict(self, x): for layer in self.layers.values(): x = layer.forward(x) return x def loss(self, x, t): y = self.predict(x) self.loss_value = self.lastlayer.forward(y, t) return self.loss_value def accuracy(self, x, t): y = self.predict(x) y = np.argmax(y, axis=1) if t.ndim != 1 : t = np.argmax(t, axis=1) accuracy = np.sum(t == y) / float(x.shape[0]) return accuracy def numerical_gradient(self, x, t): loss_W = lambda W: self.loss(x, t) grads = {} for param_name in self.params.keys(): grads[param_name] = numerical_gradient(loss_W, self.params[param_name]) return grads def gradient(self, x, t): # forward self.loss(x, t) dout = 0 dout = self.lastlayer.backward(dout) layers = list(self.layers.values()) layers.reverse() for layer in layers: dout = layer.backward(dout) grads = {} grads["W1"] = self.layers["Affine1"].dW grads["b1"] = self.layers["Affine1"].db grads["W2"] = self.layers["Affine2"].dW grads["b2"] = self.layers["Affine2"].db return grads IUUQTHJUIVCDPNPSFJMMZKBQBOEFFQMFBSOJOHGSPNTDSBUDIUSFFNBTUFS ΑΓҾ༻