Upgrade to Pro — share decks privately, control downloads, hide ads and more …

深層学習の平均場理論 (2019/07/20)

matsuno
July 20, 2019

深層学習の平均場理論 (2019/07/20)

matsuno

July 20, 2019
Tweet

More Decks by matsuno

Other Decks in Technology

Transcript

  1. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 2 / 57
  2. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 3 / 57
  3. ࠓճͷྲྀΕ 1 ݯྲྀͱͳͬͨཧ࿦Λ؆୯ʹ঺հ ෺ཧֶʹ͓͚Δฏۉ৔ۙࣅ [4] ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ [1, 2] 2 ࣍ʹɺ࠷ۙͷਂ૚ֶशͷฏۉ৔ཧ࿦Λ঺հ

    ॱ఻೻ͷฏۉ৔ཧ࿦ [5, 6] ٯ఻೻ͷฏۉ৔ཧ࿦ [6] ͦͷޙͷൃలΛ؆୯ʹ঺հ [7, 8, 9] ʢಛผݴٴ͠ͳ͍ݶΓɺਤ͸ͦΕͧΕͷࢀߟจݙ͔ΒҾ༻ʣ ೔ຊޠͷࢿྉ [10] Λେมࢀߟʹ͠·ͨ͠ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 9 / 57
  4. ࢀߟจݙ I [1] H. Sompolinsky, A. Crisanti and H. J.

    Sommers, “Chaos in random neural networks,” Physical review letters 61.3 (1988): 259. [2] ؁རढ़Ұɺ಺ాഹɺ ʮਆܦճ࿏໢ͷجૅ (೴Խֶ 2, ਺ֶऀͷͨΊͷ෼ ࢠੜ෺ֶೖ໳-৽͍͠਺ֶΛ଄Ζ͏-)ʯ ɺ෺ੑݚڀ (2006), 87(3): 451-456, http://hdl.handle.net/2433/110690. [3] A. Crisanti and H. Sompolinsky, “Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model,” Physical Review A 36.10 (1987): 4922. [4] ా࡚੖໌ɺ ʮ౷ܭྗֶ IIʯ ɺഓ෩ؗ ৽෺ཧֶγϦʔζ 38. [5] B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,” NIPS 2016. matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 10 / 57
  5. ࢀߟจݙ II [6] S. S. Schoenholz, J. Gilmer, S. Ganguli

    and J. Sohl-Dickstein, “Deep Information Propagation,” ICLR 2017. [7] G. Yang and S. S. Schoenholz, “Mean Field Residual Networks: On the Edge of Chaos,” NIPS 2017. [8] L. Xiao, Y. Bahri, J. Sohl-Dickstein, S. S. Schoenholz and J. Pennington, “Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks,” ICML 2018. [9] G. Yang, J. Pennington, V. Rao, J. Sohl-Dickstein and S. S. Schoenholz, “A Mean Field Theory of Batch Normalization,” ICLR 2019. [10] ౜໦ా྄ɺ ʮਂ૚χϡʔϥϧωοτϫʔΫͷ਺ཧ: ฏۉ৔ཧ࿦ͷࢹ ఺ʯ ɺ ʲୈ 25 ճ AI ηϛφʔʳ ʮਓ޻஌ೳͷ਺ཧʯ ɺhttps://drive. google.com/open?id=1Fhlarme8qFbhcGFLs3J8WQ3kYaJU3nbZ. matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 11 / 57
  6. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 12 / 57
  7. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 13 / 57
  8. యܕతͳྫɿIsing ໛ܕ લఏ஌ࣝ ʢԹ౓ҰఆʹอͨΕ͍ͯΔʣܥ͕ΤωϧΪʔ Ei Λ΋ͭঢ়ଶ i Ͱ͋Δ ֬཰͸ pi

    = exp(−βEi )/Z ʢβ: ਖ਼ͷఆ਺ɺZ: ن֨Խఆ਺ʣ ໰୊ઃఆ ෺࣭ͷϞσϧͱͯ͠ɺ֬཰తʹ஋ΛͱΔεϐϯ si = ±1 ͕ 2 ࣍ݩͷ ֨ࢠঢ়ʹฒΜͰ͍ͯɺ࣍ͷΤωϧΪʔΛ࣮ݱ͢Δ৔߹Λߟ͑Δ E = −J ∑ ⟨i,j⟩ si sj − H ∑ i si ⟨i, j⟩: ྡ઀͢Δ֨ࢠ఺ͷ૊ εϐϯ͕ N ݸ͋Δͱ͢Δͱঢ়ଶͷ਺͸ 2N ྫ͑͹εϐϯͷظ଴஋ ⟨si ⟩ ΛٻΊ͍͕ͨɺେྔͷεϐϯͷ૬ޓ࡞༻ ؚ͕·ΕΔͷͰ೉͍͠ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 15 / 57
  9. Ising ໛ܕʹ͓͚Δฏۉ৔ۙࣅ ΤωϧΪʔͷ͏ͪɺ͋Δ֨ࢠ఺ i = 0 ͕ؔΘΔ෦෼ʹ஫໨ E0 = −J

    4 ∑ i=1 s0si − Hs0 = − ( J 4 ∑ i=1 si + H ) s0 ฏۉ৔ۙࣅͰ͸ɺۙ๣఺ͷεϐϯͷΏΒ͗Λແࢹͯ͠ظ଴஋ ψ = ⟨si ⟩ ʹஔ͖׵͑Δ E0 ∼ Es0 = −(4Jψ + H)s0 1 ཻࢠεϐϯͷظ଴஋͸ɺEs0 Λ༻͍ͯ؆୯ʹܭࢉͰ͖ͯ ⟨s0⟩ = p(s0 = +1) − p(s0 = −1) = tanh(β(4Jψ + H)) ஫໨ͨ֨͠ࢠ఺ͱۙ๣఺ͷεϐϯͷҧ͍͸Կ΋ແ͍͔Β ⟨s0⟩ = ψ ψ = tanh(β(4Jψ + H)) ଟମܥͷεϐϯͷظ଴஋͕ɺ1 ม਺ํఔࣜͷղͰۙࣅͰ͖ͨ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 16 / 57
  10. Ising ໛ܕʹ͓͚Δฏۉ৔ۙࣅ ύϥϝʔλʹΑͬͯղ͕ඇ࿈ଓʹมԽ ⇒ ૬సҠ Ҿ༻ɿhttps://web.stanford.edu/~peastman/statmech/phasetransitions.html ࣮ઢɿy = tanh(β(4Jx +

    H)) ͷάϥϑ ఺ઢɿy = x ͷάϥϑ 2 ͭͷάϥϑͷަ఺͕ղ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 17 / 57
  11. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 19 / 57
  12. χϡʔϩϯϞσϧ ͜ͷࠒ͸ɺੜ෺ֶͷϞσϧͱͯ͠χϡʔϩϯ͕ݚڀ͞Ε͍ͯͨΑ͏ͩ ࿦จதʹ “membrane potential of the nerve cell” ʢਆܦࡉ๔ͷບిҐʣ

    ͱ͍ͬͨݴ༿͕ొ৔͢Δ ບిҐʜࡉ๔ͷ಺֎ͷిҐࠩͷ͜ͱ x ∈ RN Λೖྗͱ͠ɺz ∈ RN Λग़ྗͱ͢Δ૚ʢN ݸͷχϡʔϩϯʣ Λߟ͑Δ ॏΈ W ∈ RN×N, ൃՐͷᮢ஋ h ∈ RN, ׆ੑԽؔ਺ ϕ Λ༻͍Δͱɺग़ ྗ z ͸ z = ϕ(u) u = Wx − h matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 20 / 57
  13. ࿈ଓ࣌ؒͷχϡʔϩϯϞσϧ ੜ෺ֶͷϞσϧͳͷͰɺೖग़ྗΛ࣌ؒͷؔ਺ʹ͢Δ Wx − h ʹΑΓൃՐ͠ɺࢦ਺ؔ਺తʹݮਰ͢Δͱߟ͑ͯ z(t) = ϕ(u(t)) ∂tu(t)

    = −u(t) + Wx(t) − h ͜ͷઅͰ͸ɺ؆୯ͷͨΊҎޙ h = 0 ͱ͢Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 21 / 57
  14. ॏΈ͕ϥϯμϜͳ৔߹ ॏΈ W ͷ֤੒෼͕ɺฏۉ 0ɺ෼ࢄ σ2/N ͷਖ਼ن෼෍͔Β࣮ݱ͞Εͯ ͍Δ৔߹Λߟ͑Δ ҰݟҟͳΔॏΈΛ΋ͭωοτϫʔΫʹڞ௨ͷ๏ଇΛݟ͍ͩͤΔ ͜ͷωοτϫʔΫͷੑ࣭ͷղੳʹ͸ɺ֤εϐϯͷ૬ޓ࡞༻Λಠཱͳ

    ֬཰෼෍͔Βܾఆ͢ΔΑ͏ʹ Ising ໛ܕΛ֦ுͨ͠ɺεϐϯάϥε໛ ܕͷςΫχοΫ͕࢖͑Δ ฏۉ৔ۙࣅʹΑΓ 1 ͭͷ੒෼ ui ʹ஫໨Ͱ͖Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 23 / 57
  15. ॏΈ͕ϥϯμϜͳ৔߹ [1] Ͱ͸ɺui ͷࣗݾ૬ؔ ∆(τ) = ⟨ui (t)ui (t +

    τ)⟩ Λௐ΂͍ͯΔ ∆ ͸ҎԼͷํఔࣜΛຬͨ͢ ∂2 τ ∆ = −∂V /∂∆ V (∆) = − 1 2 ∆2 + ∫ ∞ −∞ Dz (∫ ∞ −∞ DxΦ((∆(0) − |∆|)1/2x + |∆|1/2z) )2 Dz = dz √ 2π e−z2/2, Φ(x) = ∫ ∞ 0 dyϕ(y) ∆ ͸ͦͷఆ͔ٛΒɺҎԼͷੑ࣭Λ΋ͭ ∆(−τ) = ∆(τ) ∂τ ∆(τ)|τ=0 = 0 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 24 / 57
  16. ॏΈ͕ϥϯμϜͳ৔߹ σ ͷ஋ʹΑͬͯϙςϯγϟϧ V ͕มԽ͠ɺ∆ ͷ࣌ؒൃల͕มԽ͢Δ 1 σ < 1

    ͷ৔߹ɺ∆ = 0 ͷΈ͕ൃࢄ͠ͳ͍ղ (a) 2 σ > 1 ͷ৔߹ɺύϥϝʔλͱॳظ஋ʹΑͬͯෳ਺छྨͷৼΔ෣͍ (b, c) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 25 / 57
  17. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 27 / 57
  18. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 28 / 57
  19. ه๏ D + 1 ૚ͷશ݁߹χϡʔϥϧωοτϫʔΫ ೖྗ͕ୈ 0 ૚ɺग़ྗ͕ୈ D ૚

    ୈ ℓ ૚ͷૉࢠ਺Λ Nℓ ͱॻ͘ ୈ ℓ ૚ʹ఻೻͢ΔͨΊͷॏΈͱόΠΞεΛ W ℓ, bℓ (ℓ = 1, . . . , D) ͱ ॻ͘ W ℓ ∈ RNℓ×Nℓ−1 , bℓ ∈ RNℓ ֤૚ͷग़ྗΛ xℓ ∈ RNℓ ͱ͢Δͱ xℓ = ϕ(hℓ), hℓ = W ℓxℓ−1 + bℓ x0: ೖྗ ϕ: ׆ੑԽؔ਺ʢ੒෼ຖʹ࡞༻ͤ͞Δʣ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 29 / 57
  20. ໰୊ઃఆ ॏΈͱόΠΞεͷ֤੒෼͕ɺҎԼͷ෼෍͔ΒͦΕͧΕϥϯμϜʹબ ୒͞ΕΔͱ͢Δ ॏΈɿฏۉ 0ɺ෼ࢄ σ2 w /Nℓ−1 ͷਖ਼ن෼෍ όΠΞεɿฏۉ

    0ɺ෼ࢄ σ2 b ͷਖ਼ن෼෍ ෼ࢄΛ Nℓ−1 Ͱׂ͍ͬͯΔͷ͸ɺ࣍ͷ૚ͷग़ྗͷεέʔϧΛଗ͑ΔͨΊ hℓ i = Nℓ−1 ∑ j=1 W ℓ ij xℓ−1 j + bℓ i ೖྗ x0 ͸ͲͷΑ͏ʹ఻೻͢Δ͔ʁ 2 ͭͷೖྗ x0,1 ͱ x0,2 ͷ૬ؔ͸఻೻ʹΑͬͯͲͷΑ͏ʹมԽ͢Δ͔ʁ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 30 / 57
  21. 1 ͭͷೖྗͷ఻೻ɿܭࢉ (1/2) ֤૚ͷૉࢠ͕ͲΕ͘Β͍ൃՐ͍ͯ͠Δ͔Λ࣍ͷྔͰఆٛ͢Δ qℓ = 1 Nℓ Nℓ ∑

    i=1 ( hℓ i )2 hℓ i (i = 1, . . . , Nℓ) ͕ͭ͘Δ֬཰෼෍ʢܦݧ෼෍ʣ͸ Nℓ → ∞ Ͱਖ਼ن ෼෍ͱͳΓɺͦͷ෼ࢄ͸ qℓ ਖ਼ن෼෍͔Βͷେྔͷ࣮ݱ஋ W ℓ, bℓ Ͱ༩͑ΒΕΔ ( hℓ i )2 ͷૉࢠํ޲ ͷظ଴஋͸ɺ1 ͭͷૉࢠʹ஫໨ͨ͠ͱ͖ͷॏΈͱόΠΞεʹؔ͢Δ ظ଴஋ͱಉ͡ qℓ = ⟨( hℓ i )2 ⟩ ⟨·⟩ ͸ɺW ℓ ͱ bℓ ʹؔ͢Δظ଴஋Λද͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 31 / 57
  22. 1 ͭͷೖྗͷ఻೻ɿܭࢉ (2/2) W ℓ, bℓ ͸ฏۉ 0 ͷਖ਼ن෼෍ͷ࣮ݱ஋ͳͷͰɺަ߲ࠩͷظ଴஋͸͢΂ ͯফ͑Δ

    qℓ = ⟨( hℓ i )2 ⟩ = ⟨  Nℓ−1 ∑ j=1 W ℓ ij ϕ ( hℓ−1 j ) + bℓ i   2⟩ = σ2 w Nℓ−1 Nℓ−1 ∑ i=1 ( ϕ ( hℓ−1 i ))2 + σ2 b 1 Nℓ−1 ∑ (· · · ) ͸ܦݧ෼෍Λ༻͍ͨظ଴஋Λද͍ͯ͠Δ Nℓ−1 → ∞ Ͱ͸ɺਖ਼ن෼෍Λ༻͍ͨظ଴஋ʹஔ͖׵͑ΒΕΔ qℓ = σ2 w ∫ ∞ −∞ dh(ϕ(h))2N ( h 0, qℓ−1 ) + σ2 b = σ2 w ∫ ∞ −∞ Dz ( ϕ (√ qℓ−1z ))2 + σ2 b , Dz = dz √ 2π e−z2/2 (1) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 32 / 57
  23. 1 ͭͷೖྗͷ఻೻ɿ࣮ݧ (1/2) ϕ(·) = tanh(·), σb = 0.3, Nℓ

    = 1000 Ͱͷάϥϑʢೱ͍ઢ͸ཧ࿦஋ʣ A qℓ−1 ʹର͢Δ qℓ ͷάϥϑ B ز͔ͭͷೖྗʹର͢Δ qℓ ͷมભ qℓ−1 = qℓ ͱͳΔݻఆ఺ q∗ ͕ଘࡏʢάϥϑ A ͷˑʣ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 33 / 57
  24. 1 ͭͷೖྗͷ఻೻ɿ࣮ݧ (2/2) σb = 0, σw < 1 ͷ৔߹

    ݻఆ఺͸ q∗ = 0 ͷΈ ৗʹ qℓ−1 > qℓ ͱͳΓɺग़ྗ͸ 0 ʹऩଋ͢Δ σb = 0, σw > 1 ͷ৔߹ ෆ҆ఆͳݻఆ఺ q∗ = 0 ͱɺ҆ఆͳݻఆ఺͕ଘࡏ͢Δ େ͖ͳೖྗ΋খ͞ͳೖྗ΋ɺ఻೻ʹΑΓ҆ఆͳ q∗ ʹۙͮ͘ σb ̸= 0 ͷ৔߹ ඇྵͷ҆ఆͳݻఆ఺ͷΈ͕ଘࡏ͢Δ σw < 1 Ͱ͋ͬͯ΋ɺόΠΞεʹΑΓग़ྗͷ 0 ΁ͷऩଋ͕๷͕ΕΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 34 / 57
  25. 2 ͭͷೖྗͷ఻೻ɿܭࢉ (1/4) 2 ͭͷೖྗ x0,1, x0,2 ʹର͠ɺ֤૚ͷ૬ؔΛ࣍ͷྔͰఆٛ͢Δ qℓ ab

    = 1 Nℓ Nℓ ∑ i=1 hℓ i ( x0,a ) hℓ i ( x0,b ) , a, b ∈ {1, 2} a = b ͷ৔߹͸ qℓ ʹͳΔͷͰɺqℓ 12 Λߟ͑Δ hℓ i ( x0,1 ) ͱ hℓ i ( x0,2 ) ͷಉ࣌ܦݧ෼෍͸ɺNℓ → ∞ Ͱ 2 ࣍ݩਖ਼ن෼෍ ͱͳΓɺڞ෼ࢄ͸ Qℓ ୈ (a, b) ੒෼͕ qℓ ab Ͱ͋ΔߦྻΛ Qℓ ͱॻ͘͜ͱʹ͢Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 35 / 57
  26. 2 ͭͷೖྗͷ఻೻ɿܭࢉ (2/4) 1 ͭͷೖྗͷ৔߹ͱಉ༷ʹܭࢉ͢Δͱ qℓ 12 = ⟨ hℓ

    i ( x0,1 ) hℓ i ( x0,2 )⟩ = ⟨  Nℓ−1 ∑ j=1 W ℓ ij ϕ ( hℓ−1 j ( x0,1 )) + bℓ i     Nℓ−1 ∑ k=1 W ℓ ik ϕ ( hℓ−1 k ( x0,1 )) + bℓ i   ⟩ = σ2 w Nℓ−1 Nℓ−1 ∑ i=1 ϕ ( hℓ−1 i ( x0,1 )) ϕ ( hℓ−1 i ( x0,2 )) + σ2 b Nℓ−1→∞ − − − − − − → σ2 w ∫ dh1dh2ϕ (h1) ϕ (h2) N ( ⃗ h ⃗ 0, Qℓ−1 ) + σ2 b = σ2 w ∫ dh1dh2ϕ (h1) ϕ (h2) 1 2π √ |Qℓ−1| exp ( − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h ) + σ2 b matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 36 / 57
  27. 2 ͭͷೖྗͷ఻೻ɿܭࢉ (3/4) 2 ม਺ͷಠཱͳΨ΢εੵ෼Ͱॻ͖͍ͨ ૬ؔ܎਺ cℓ = qℓ 12

    / √ qℓ 11 qℓ 22 Λ༻͍Δͱɺexp ͷத͸ − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h = − 1 2 ( 1 − (cℓ−1)2 )   h2 1 qℓ−1 11 − 2cℓ−1 √ qℓ−1 11 qℓ−1 22 h1h2 + h2 2 qℓ−1 22   ม਺ม׵ h1 = √ qℓ−1 11 z1, h2 = √ qℓ−1 22 ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) Λߦ͏ͱ − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h = − 1 2 ( z2 1 + z2 2 ) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 37 / 57
  28. 2 ͭͷೖྗͷ఻೻ɿܭࢉ (4/4) ม਺ม׵ʹΑΔϠίϏΞϯ͸ ∂ (h1, h2) ∂ (z1, z2)

    = √ qℓ−1 11 qℓ−1 22 [ 1 − (cℓ−1)2 ] = √ |Qℓ−1| ·ͱΊΔͱ qℓ 12 = σ2 w ∫ Dz1Dz2ϕ (h1) ϕ (h2) + σ2 b , h1 = √ qℓ−1 11 z1, h2 = √ qℓ−1 22 ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) qℓ 11 , qℓ 22 ͸े෼ૣ͘ q∗ ʹऩଋ͢ΔͷͰɺҎԼͷࣜͰݻఆ఺Ͱͷ૬ؔ ܎਺Λٞ࿦͢Δ cℓ = 1 q∗ ( σ2 w ∫ Dz1Dz2ϕ (h1) ϕ (h2) + σ2 b ) , (2) h1 = √ q∗z1, h2 = √ q∗ ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 38 / 57
  29. ૬ؔ܎਺ͷৼΔ෣͍ cℓ ( cℓ−1 = 1 ) = 1 ͳͷͰɺc∗

    = 1 ͸ݻఆ఺ c∗ = 1 ͷ҆ఆੑ͸ɺχ1 = ∂cℓ ∂cℓ−1 cℓ−1=1 Ͱܾ·Δ χ1 < 1 ͳΒɺc∗ = 1 ͷۙ͘Ͱ cℓ−1 < cℓ ͱͳΓɺc∗ = 1 ʹऩଋ͢Δ χ1 > 1 ͳΒ c∗ = 1 ͷۙ͘Ͱ cℓ−1 > cℓ ͱͳΓɺc∗ = 1 ͔Βԕ͔͟Δ χ1 = ∂cℓ ∂cℓ−1 cℓ−1=1 = σ2 w √ q∗ ∫ Dz1Dz2ϕ (h1) ϕ′ (h2)  z1 − cℓ−1 √ 1 − (cℓ−1)2 z2   cℓ−1=1 ∫ DzF(z)z = ∫ DzF′(z) Λ༻͍Δͱ χ1 = σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) cℓ−1=1 = σ2 w ∫ Dz ( ϕ′ (√ q∗z ))2 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 39 / 57
  30. ૬ؔ܎਺ͷৼΔ෣͍ 1 χ1 < 1: டং૬ c∗ = 1 ͸҆ఆͳݻఆ఺

    ҟͳΔೖྗͰ΋ɺ఻೻ʹΑͬͯ૬͕ؔߴ͘ͳ͍ͬͯ͘ σw ͸খ͍͞ 2 χ1 > 1: ΧΦε૬ c∗ = 1 ͸ෆ҆ఆͳݻఆ఺ ҆ఆͳݻఆ఺ c∗ < 1 ͕ଘࡏ ҟͳΔೖྗͷࠩΛ֦େ͢Δ σw ͸େ͖͍ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 40 / 57
  31. ಛ௃తͳਂ͞εέʔϧͷଘࡏ ࣮ݧͯ͠ΈΔͱɺq∗ ΍ c∗ ΁ͷऩଋ͸ࢦ਺ؔ਺తʹͳ͍ͬͯΔ qℓ − q∗ ∼ e−ℓ/ξq

    cℓ − c∗ ∼ e−ℓ/ξc ξq, ξc ͸ͦΕͧΕɺ୯Ұͷೖྗͷେ͖͞ͱɺ2 ͭͷೖྗͷ૬͕ؔͲΕ ͚ͩਂ͍૚·Ͱऩଋͤͣʹ఻೻Ͱ͖Δ͔Λද͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 41 / 57
  32. ξq ͷܭࢉ (1) ࣜʹ qℓ = q∗ + ϵℓ q

    Λ୅ೖ͠ɺTaylor ల։͢Δͱ ϵℓ q = ϵℓ−1 q σ2 w √ q∗ ∫ Dzϕ (√ q∗z ) ϕ′ (√ q∗z ) z + · · · = ϵℓ−1 q σ2 w √ q∗ ∫ Dz ∂ ∂z [ ϕ (√ q∗z ) ϕ′ (√ q∗z )] + · · · = ϵℓ−1 q [ χ1 + σ2 w ∫ Dzϕ (√ q∗z ) ϕ′′ (√ q∗z )] + · · · ϵℓ q ∼ e−ℓ/ξq ͱൺֱ͢Δͱ ξ−1 q = − log [ χ1 + σ2 w ∫ Dzϕ (√ q∗z ) ϕ′′ (√ q∗z )] matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 42 / 57
  33. ξc ͷܭࢉ ಉ༷ʹɺ(2) ࣜʹ cℓ = c∗ + ϵℓ c

    Λ୅ೖ͠ɺTaylor ల։͢Δ ϵℓ c = ϵℓ−1 c σ2 w √ q∗ ∫ Dz1Dz2ϕ (h1) ϕ′ (h2)  z1 − c∗ √ 1 − (c∗)2 z2   + · · · = ϵℓ−1 c σ2 w √ q∗ ∫ Dz1Dz2   ∂ ∂z1 − c∗ √ 1 − (c∗)2 ∂ ∂z2   [ ϕ (h1) ϕ′ (h2) ] + · · · = ϵℓ−1 c σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) + · · · ϵℓ c ∼ e−ℓ/ξc ͱൺֱ͢Δͱ ξ−1 c = − log [ σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) ] matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 43 / 57
  34. ξq , ξc ͷৼΔ෣͍ ࣮ݧʢ࣮ઢʣͱཧ࿦஋ʢഁઢʣͷൺֱ σ2 b = 0.01ʢࠇʣ͔Β σ2

    b = 0.3ʢ྘ʣ·Ͱม͑ͨ݁Ռ ࣮ݧͱཧ࿦ͰΑ͘߹͍ͬͯΔ டং૬Ͱ͸ c∗ = 1 ͳͷͰ ξ−1 c = − log χ1 Ͱ͋ΓɺసҠ఺Ͱ͸ χ1 = 1 ͔ͩΒ ξc ͸ൃࢄ͢Δ ࣮ݧతʹ΋ൃࢄ͕Α͘ݟ͍͑ͯΔ σʔλؒͷ૬͕ؔऩଋͤͣʹ఻Θ͍ͬͯ͘ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 44 / 57
  35. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 45 / 57
  36. ٯ఻೻ͷฏۉ৔ཧ࿦ ͔͜͜Β͸ٯ఻೻ʹରͯ͠ಉ༷ͷٞ࿦ΛߦͬͯΈΔ ଛࣦؔ਺ E ͷඍ෼ δℓ i = ∂E/∂hℓ i

    ͕ͲΕ͚ͩൃՐ͍ͯ͠Δ͔Λߟ͑Δ qℓ = ⟨( δℓ i )2 ⟩ ॱ఻ൖͷ৔߹ͷ qℓ ʹରԠ qℓ ͕ඇྵ͔ͭ༗ݶʹͱͲ·ΔύϥϝʔλྖҬ͕஌Γ͍ͨ ௨ৗͷ chain rule ʹΑΓɺδℓ i ͸࣍ͷੑ࣭Λ΋ͭ δℓ i = ∂E ∂hℓ i = Nℓ+1 ∑ j=1 ∂E ∂hℓ+1 j ∂hℓ+1 j ∂hℓ i = ϕ′ ( hℓ i ) Nℓ+1 ∑ j=1 δℓ+1 j W ℓ+1 ji matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 46 / 57
  37. ٯ఻೻ͷฏۉ৔ཧ࿦ɿܭࢉ ·ͣ͸ δℓ i ͷදࣜΛ୅ೖ qℓ = ⟨( δℓ i

    )2 ⟩ = ⟨ ϕ′ ( hℓ i ) Nℓ+1 ∑ j=1 δℓ+1 j W ℓ+1 ji   2⟩ ͜͜Ͱɺٯ఻೻ܭࢉ࣌ʹ࢖༻͢ΔॏΈ͸ɺॱ఻೻ʹ࢖༻͢ΔॏΈͱ ͸ಠཱʹਖ਼ن෼෍͔Βऔ͖͍ͬͯͯΔͱԾఆ͢Δͱɺ֤߲͕෼཭Ͱ ͖ͯ qℓ = ⟨( ϕ′ ( hℓ i ))2 ⟩ Nℓ+1 ∑ j=1 ⟨( δℓ+1 j )2 ⟩ ⟨( W ℓ+1 ji )2 ⟩ = ∫ dh ( ϕ′(h) )2 N ( h 0, qℓ ) × qℓ+1 Nℓ+1 ∑ j=1 σ2 w Nℓ qℓ→q∗ − − − − → qℓ+1 Nℓ+1 Nℓ χ1 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 47 / 57
  38. ಛ௃తͳਂ͞εέʔϧ ॱ఻೻ͷ৔߹ͱಉ༷ʹɺಛ௃తͳਂ͞εέʔϧ ξ∇ ΛܭࢉͰ͖Δ ؆୯ͷͨΊ Nℓ = Nℓ+1 ͱ͢Δͱɺqℓ =

    qℓ+1χ1 ͳͷͰ qℓ ∼ qDe−(D−ℓ)/ξ∇ , ξ−1 ∇ = − log χ1 ॱ఻೻ͷ৔߹ʹࣔͨ͠ 2 ͭͷ૬ͱసҠ఺௚্Ͱɺޯ഑ͷৼΔ෣͍͕ େ͖͘มΘΔ டং૬ (χ1 < 1) Ͱ͸ ξ∇ > 0 ͱͳΓɺ|ξ∇ | ͘Β͍ͷٯ఻೻Ͱޯ഑͕ ফࣦ ΧΦε૬ (χ1 > 1) Ͱ͸ ξ∇ < 0 ͱͳΓɺ|ξ∇ | ͘Β͍ͷٯ఻೻Ͱޯ഑͕ ൃࢄ సҠ఺௚্ (χ1 = 1) Ͱ͸ޯ഑͕༗ݶʹͱͲ·Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 48 / 57
  39. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 51 / 57
  40. Residual Network ͷฏۉ৔ཧ࿦ શ݁߹ residual network ʹର͢Δฏۉ৔ཧ࿦Λߏங [7] શ݁߹ residual

    network: xℓ = V ℓϕ ( hℓ ) + xℓ−1 + aℓ, hℓ = W ℓxℓ−1 + bℓ ओͳࢦඪɿ eℓ = ⟨ hℓ,1 i hℓ,2 i ⟩ √⟨( xℓ,1 i )2 ⟩ ⟨( xℓ,2 i )2 ⟩, χℓ = ⟨( ∂E ∂xℓ i )2 ⟩ ϕ(·) = tanh(·) ͷ৔߹ eℓ − e∗ ∼ ℓ−δ∗ , χm ∼ eA( √ ℓ− √ m)χℓ (A ∝ σw ) ී௨ͷωοτϫʔΫΑΓऩଋ͕஗͘ɺΑΓ৘ใ͕఻ΘΓ΍͍͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 52 / 57
  41. Residual Network ͷฏۉ৔ཧ࿦ɿ࣮ݧ MNIST Ͱֶशͨ͠ͱ͖ͷɺςετσʔλʹର͢Δਫ਼౓ ఺ઢɿlog ( χ0/χℓ ) ∼

    σw √ L ͕Ұఆ tanh ͷ৔߹ʹ͸ eℓ − e∗ ΑΓ΋ χm ͷํ͕ൃࢄ͠΍͍͢ͷͰɺχm ͷ ൃࢄΛ཈͑ΔϋΠύʔύϥϝʔλͰ͋Δ͜ͱ͕ॏཁ ޯ഑ͷେ͖͕͞มΘΒͳ͍͜ͱ͕ཧ૝ͳΒɺlog ( χ0/χℓ ) = 0 ͕ཧ૝ʁ → σw ͕খ͍͞ͱҟͳΔೖྗͷ఻೻ͷ͕ࠩ޿͕Βͣɺදݱೳྗ্͕͕ Βͣʹ͏·͍͔͘ͳ͍ͱߟ͑ΒΕΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 53 / 57
  42. ৞ΈࠐΈχϡʔϥϧωοτϫʔΫ (CNN) ͷฏۉ৔ཧ࿦ CNN ʹର͢Δฏۉ৔ཧ࿦Λߏங [8] पظڥք৚݅Λ΋ͭ 1 ࣍ݩ CNN

    νϟϯωϧํ޲͸े෼େ͖͍ͱߟ͑Δ͕ɺۭؒํ޲͸༗ݶͰΑ͍ 2 ࣍ݩ CNN ʹର͢Δ৽͍͠ॏΈॳظԽํ๏ΛఏҊ͠ɺ10,000 ૚ͷ CNN ͷֶशʹ੒ޭ ૚Λਂ͘͢Δ͚ͩͰ͸಄ଧͪʹͳ͍ͬͯΔͷͰɺresidual connection ΍ batch normalization ͷΑ͏ͳߏ଄ֶ͕शͷޮ཰ੑҎ֎ͷ؍఺͔Β΋ ॏཁ͔΋͠Εͳ͍ ࠨɿMNISTɺӈɿCIFAR-10 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 54 / 57
  43. ໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦

    ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 56 / 57