Slide 1

Slide 1 text

ਂ૚ֶशͷฏۉ৔ཧ࿦ matsuno 2019 ೥ 7 ݄ 20 ೔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 1 / 57

Slide 2

Slide 2 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 2 / 57

Slide 3

Slide 3 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 3 / 57

Slide 4

Slide 4 text

ਂ૚ֶशͷൃల ίϯϐϡʔλϏδϣϯ ػց຋༁ ήʔϜ߈ུ ͳͲͳͲʜ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 4 / 57

Slide 5

Slide 5 text

ͳͥ͜Μͳʹ੒ޭ͔ͨ͠ʁ දݱೳྗͷߴ͞ ύϥϝʔλͷଟ͞ ૚ͷਂ͞ ࠷దԽख๏ͷൃల όοΫϓϩύήʔγϣϯ ֬཰తޯ഑߱Լ๏ (SGD)ɺAdam, ... ൚Խख๏ͷൃల Dropout, batch normalization, ... matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 5 / 57

Slide 6

Slide 6 text

ωοτϫʔΫͷબ୒ɾ։ൃ͸৬ਓٕʁ ωοτϫʔΫߏ଄ɺ׆ੑԽؔ਺ɺ࠷దԽख๏ɺʜ1 χϡʔϥϧωοτϫʔΫͷཧ࿦͕ٕज़ͷൃలʹ௥͍͍͍ͭͯͳ͍ͨΊ χϡʔϥϧωοτϫʔΫͷҰൠతͳੑ࣭Λ஌Γ͍ͨ ಉ͡ૉࢠ਺ͳΒਂ͍ํ͕ྑ͍ͷ͔ʁ ܦݧతʹಋೖ͞ΕΔςΫχοΫ͕ͳͥ͏·͘ߦ͘ͷ͔ʁ ׆ੑԽؔ਺΍ωοτϫʔΫʹΑΒͳ͍ཧ࿦΍ɺදݱೳྗͷࢦඪ͸͋ Δ͔ʁ 1࠷ۙ͸ Neural Architecture Search (NAS) ͱݺ͹ΕΔख๏΋͋Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 6 / 57

Slide 7

Slide 7 text

ਂ૚ֶशʹ͓͚Δฏۉ৔ཧ࿦ Ұൠ࿦Λల։͢ΔͨΊʹɺ͢΂ͯͷॏΈ͕ਖ਼ن෼෍͔Βͷ࣮ݱ஋Ͱ ͋Δͱ͢Δ ʹֶशલʹॏΈΛॳظԽͨ͠ঢ়ଶ ෼ࢄͳͲͷύϥϝʔλʹґଘͯ͠ɺೖྗ͕ͲͷΑ͏ʹ૚Λ఻೻͢Δ ͔Λ౷ܭతʹௐ΂Δʢʹظ଴஋Λܭࢉ͢Δʣ ఻೻ʹΑͬͯɺҟͳΔೖྗ͕ಉ͡஋ʹऩଋͯ͠͠·ͬͨΓ͠ͳ͍͔ ޯ഑͕ൃࢄͨ͠Γɺফࣦͯ͠͠·ͬͨΓ͠ͳ͍͔ ͏·ֶ͘श͢ΔͨΊͷɺॏΈॳظԽͷύϥϝʔλΛٻΊΒΕͨΓ ͢Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 7 / 57

Slide 8

Slide 8 text

ྺ࢙తܦҢ χϡʔϥϧωοτϫʔΫ͕ɺਆܦճ࿏໢Ϟσϧͱݺ͹Ε͍ͯͨࠒͷ ݚڀʹ୺Λൃ͢Δ [1, 2] ͞ΒʹݩΛͨͲΔͱɺεϐϯάϥεͱݺ͹ΕΔଟମܥͷ෺ཧֶ [3] ʜ͕ɺຊ࣭తʹ͸ؔ࿈͸͋·Γແ͍ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 8 / 57

Slide 9

Slide 9 text

ࠓճͷྲྀΕ 1 ݯྲྀͱͳͬͨཧ࿦Λ؆୯ʹ঺հ ෺ཧֶʹ͓͚Δฏۉ৔ۙࣅ [4] ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ [1, 2] 2 ࣍ʹɺ࠷ۙͷਂ૚ֶशͷฏۉ৔ཧ࿦Λ঺հ ॱ఻೻ͷฏۉ৔ཧ࿦ [5, 6] ٯ఻೻ͷฏۉ৔ཧ࿦ [6] ͦͷޙͷൃలΛ؆୯ʹ঺հ [7, 8, 9] ʢಛผݴٴ͠ͳ͍ݶΓɺਤ͸ͦΕͧΕͷࢀߟจݙ͔ΒҾ༻ʣ ೔ຊޠͷࢿྉ [10] Λେมࢀߟʹ͠·ͨ͠ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 9 / 57

Slide 10

Slide 10 text

ࢀߟจݙ I [1] H. Sompolinsky, A. Crisanti and H. J. Sommers, “Chaos in random neural networks,” Physical review letters 61.3 (1988): 259. [2] ؁རढ़Ұɺ಺ాഹɺ ʮਆܦճ࿏໢ͷجૅ (೴Խֶ 2, ਺ֶऀͷͨΊͷ෼ ࢠੜ෺ֶೖ໳-৽͍͠਺ֶΛ଄Ζ͏-)ʯ ɺ෺ੑݚڀ (2006), 87(3): 451-456, http://hdl.handle.net/2433/110690. [3] A. Crisanti and H. Sompolinsky, “Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model,” Physical Review A 36.10 (1987): 4922. [4] ా࡚੖໌ɺ ʮ౷ܭྗֶ IIʯ ɺഓ෩ؗ ৽෺ཧֶγϦʔζ 38. [5] B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,” NIPS 2016. matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 10 / 57

Slide 11

Slide 11 text

ࢀߟจݙ II [6] S. S. Schoenholz, J. Gilmer, S. Ganguli and J. Sohl-Dickstein, “Deep Information Propagation,” ICLR 2017. [7] G. Yang and S. S. Schoenholz, “Mean Field Residual Networks: On the Edge of Chaos,” NIPS 2017. [8] L. Xiao, Y. Bahri, J. Sohl-Dickstein, S. S. Schoenholz and J. Pennington, “Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks,” ICML 2018. [9] G. Yang, J. Pennington, V. Rao, J. Sohl-Dickstein and S. S. Schoenholz, “A Mean Field Theory of Batch Normalization,” ICLR 2019. [10] ౜໦ా྄ɺ ʮਂ૚χϡʔϥϧωοτϫʔΫͷ਺ཧ: ฏۉ৔ཧ࿦ͷࢹ ఺ʯ ɺ ʲୈ 25 ճ AI ηϛφʔʳ ʮਓ޻஌ೳͷ਺ཧʯ ɺhttps://drive. google.com/open?id=1Fhlarme8qFbhcGFLs3J8WQ3kYaJU3nbZ. matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 11 / 57

Slide 12

Slide 12 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 12 / 57

Slide 13

Slide 13 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 13 / 57

Slide 14

Slide 14 text

෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ͨ͘͞Μͷཻࢠ͕ू·Δͱɺཻࢠ 1 ͚ͭͩͰ͸Θ͔Βͳ͍ϚΫϩͳ ໘ന͍ੑ࣭͕ݱΕΔ యܕతͳݱ৅ɿ૬సҠ ਫɾණɾਫৠؾ͸ͲΕ΋ಉ͡ਫ෼ࢠͰͰ͖͍ͯΔ͕ɺಛఆͷԹ౓ɾѹ ྗͰঢ়ଶ͕มԽ͢Δ ෺࣭ͷੑ࣭Λ໌Β͔ʹ͢ΔͨΊʹ෺࣭ΛϞσϧԽ͠ɺղੳ͢Δࡍͷ 1 ͭͷख๏͕෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 14 / 57

Slide 15

Slide 15 text

యܕతͳྫɿIsing ໛ܕ લఏ஌ࣝ ʢԹ౓ҰఆʹอͨΕ͍ͯΔʣܥ͕ΤωϧΪʔ Ei Λ΋ͭঢ়ଶ i Ͱ͋Δ ֬཰͸ pi = exp(−βEi )/Z ʢβ: ਖ਼ͷఆ਺ɺZ: ن֨Խఆ਺ʣ ໰୊ઃఆ ෺࣭ͷϞσϧͱͯ͠ɺ֬཰తʹ஋ΛͱΔεϐϯ si = ±1 ͕ 2 ࣍ݩͷ ֨ࢠঢ়ʹฒΜͰ͍ͯɺ࣍ͷΤωϧΪʔΛ࣮ݱ͢Δ৔߹Λߟ͑Δ E = −J ∑ ⟨i,j⟩ si sj − H ∑ i si ⟨i, j⟩: ྡ઀͢Δ֨ࢠ఺ͷ૊ εϐϯ͕ N ݸ͋Δͱ͢Δͱঢ়ଶͷ਺͸ 2N ྫ͑͹εϐϯͷظ଴஋ ⟨si ⟩ ΛٻΊ͍͕ͨɺେྔͷεϐϯͷ૬ޓ࡞༻ ؚ͕·ΕΔͷͰ೉͍͠ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 15 / 57

Slide 16

Slide 16 text

Ising ໛ܕʹ͓͚Δฏۉ৔ۙࣅ ΤωϧΪʔͷ͏ͪɺ͋Δ֨ࢠ఺ i = 0 ͕ؔΘΔ෦෼ʹ஫໨ E0 = −J 4 ∑ i=1 s0si − Hs0 = − ( J 4 ∑ i=1 si + H ) s0 ฏۉ৔ۙࣅͰ͸ɺۙ๣఺ͷεϐϯͷΏΒ͗Λແࢹͯ͠ظ଴஋ ψ = ⟨si ⟩ ʹஔ͖׵͑Δ E0 ∼ Es0 = −(4Jψ + H)s0 1 ཻࢠεϐϯͷظ଴஋͸ɺEs0 Λ༻͍ͯ؆୯ʹܭࢉͰ͖ͯ ⟨s0⟩ = p(s0 = +1) − p(s0 = −1) = tanh(β(4Jψ + H)) ஫໨ͨ֨͠ࢠ఺ͱۙ๣఺ͷεϐϯͷҧ͍͸Կ΋ແ͍͔Β ⟨s0⟩ = ψ ψ = tanh(β(4Jψ + H)) ଟମܥͷεϐϯͷظ଴஋͕ɺ1 ม਺ํఔࣜͷղͰۙࣅͰ͖ͨ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 16 / 57

Slide 17

Slide 17 text

Ising ໛ܕʹ͓͚Δฏۉ৔ۙࣅ ύϥϝʔλʹΑͬͯղ͕ඇ࿈ଓʹมԽ ⇒ ૬సҠ Ҿ༻ɿhttps://web.stanford.edu/~peastman/statmech/phasetransitions.html ࣮ઢɿy = tanh(β(4Jx + H)) ͷάϥϑ ఺ઢɿy = x ͷάϥϑ 2 ͭͷάϥϑͷަ఺͕ղ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 17 / 57

Slide 18

Slide 18 text

෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ૬ޓ࡞༻ͷ͋Δଟମܥͷ໰୊Λɺ1 ཻࢠͷܥʹۙࣅ͢Δ ͦͷࡍɺ1 ཻࢠͱଞͷཻࢠͱͷ૬ޓ࡞༻ΛɺฏۉతͳޮՌʢฏۉ৔ʣ ʹΑͬͯஔ͖׵͑Δ ֬཰తͳΏΒ͗͸ແࢹ͞ΕΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 18 / 57

Slide 19

Slide 19 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 19 / 57

Slide 20

Slide 20 text

χϡʔϩϯϞσϧ ͜ͷࠒ͸ɺੜ෺ֶͷϞσϧͱͯ͠χϡʔϩϯ͕ݚڀ͞Ε͍ͯͨΑ͏ͩ ࿦จதʹ “membrane potential of the nerve cell” ʢਆܦࡉ๔ͷບిҐʣ ͱ͍ͬͨݴ༿͕ొ৔͢Δ ບిҐʜࡉ๔ͷ಺֎ͷిҐࠩͷ͜ͱ x ∈ RN Λೖྗͱ͠ɺz ∈ RN Λग़ྗͱ͢Δ૚ʢN ݸͷχϡʔϩϯʣ Λߟ͑Δ ॏΈ W ∈ RN×N, ൃՐͷᮢ஋ h ∈ RN, ׆ੑԽؔ਺ ϕ Λ༻͍Δͱɺग़ ྗ z ͸ z = ϕ(u) u = Wx − h matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 20 / 57

Slide 21

Slide 21 text

࿈ଓ࣌ؒͷχϡʔϩϯϞσϧ ੜ෺ֶͷϞσϧͳͷͰɺೖग़ྗΛ࣌ؒͷؔ਺ʹ͢Δ Wx − h ʹΑΓൃՐ͠ɺࢦ਺ؔ਺తʹݮਰ͢Δͱߟ͑ͯ z(t) = ϕ(u(t)) ∂tu(t) = −u(t) + Wx(t) − h ͜ͷઅͰ͸ɺ؆୯ͷͨΊҎޙ h = 0 ͱ͢Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 21 / 57

Slide 22

Slide 22 text

৘ใͷϑΟʔυόοΫ χϡʔϩϯ͸ܨ͕ΓʹΑΓ৘ใͷϑΟʔυόοΫ͕͋Δ ͜͜Ͱ͸ɺग़ྗ͕ͦͷ··ೖྗͱͳΔϞσϧΛߟ͑Δ x(t) = ϕ(u(t)) ∂tu(t) = −u(t) + Wx(t) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 22 / 57

Slide 23

Slide 23 text

ॏΈ͕ϥϯμϜͳ৔߹ ॏΈ W ͷ֤੒෼͕ɺฏۉ 0ɺ෼ࢄ σ2/N ͷਖ਼ن෼෍͔Β࣮ݱ͞Εͯ ͍Δ৔߹Λߟ͑Δ ҰݟҟͳΔॏΈΛ΋ͭωοτϫʔΫʹڞ௨ͷ๏ଇΛݟ͍ͩͤΔ ͜ͷωοτϫʔΫͷੑ࣭ͷղੳʹ͸ɺ֤εϐϯͷ૬ޓ࡞༻Λಠཱͳ ֬཰෼෍͔Βܾఆ͢ΔΑ͏ʹ Ising ໛ܕΛ֦ுͨ͠ɺεϐϯάϥε໛ ܕͷςΫχοΫ͕࢖͑Δ ฏۉ৔ۙࣅʹΑΓ 1 ͭͷ੒෼ ui ʹ஫໨Ͱ͖Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 23 / 57

Slide 24

Slide 24 text

ॏΈ͕ϥϯμϜͳ৔߹ [1] Ͱ͸ɺui ͷࣗݾ૬ؔ ∆(τ) = ⟨ui (t)ui (t + τ)⟩ Λௐ΂͍ͯΔ ∆ ͸ҎԼͷํఔࣜΛຬͨ͢ ∂2 τ ∆ = −∂V /∂∆ V (∆) = − 1 2 ∆2 + ∫ ∞ −∞ Dz (∫ ∞ −∞ DxΦ((∆(0) − |∆|)1/2x + |∆|1/2z) )2 Dz = dz √ 2π e−z2/2, Φ(x) = ∫ ∞ 0 dyϕ(y) ∆ ͸ͦͷఆ͔ٛΒɺҎԼͷੑ࣭Λ΋ͭ ∆(−τ) = ∆(τ) ∂τ ∆(τ)|τ=0 = 0 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 24 / 57

Slide 25

Slide 25 text

ॏΈ͕ϥϯμϜͳ৔߹ σ ͷ஋ʹΑͬͯϙςϯγϟϧ V ͕มԽ͠ɺ∆ ͷ࣌ؒൃల͕มԽ͢Δ 1 σ < 1 ͷ৔߹ɺ∆ = 0 ͷΈ͕ൃࢄ͠ͳ͍ղ (a) 2 σ > 1 ͷ৔߹ɺύϥϝʔλͱॳظ஋ʹΑͬͯෳ਺छྨͷৼΔ෣͍ (b, c) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 25 / 57

Slide 26

Slide 26 text

͜͜·Ͱͷ·ͱΊ ೖग़ྗΛಉҰࢹͨ͠࿈ଓ࣌ؒͷχϡʔϩϯϞσϧʹରͯ͠ॏΈΛϥ ϯμϜʹ͢Δͱɺଟମܥ෺ཧͷݚڀ݁Ռʢʹฏۉ৔ཧ࿦ʣ͕࢖͑Δ ࠷ۙͷਂ૚ֶशͷฏۉ৔ཧ࿦ͱڞ௨͢Δͷ͸ɺॏΈΛϥϯμϜʹ͢ Δͱ͜Ζ͘Β͍ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 26 / 57

Slide 27

Slide 27 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 27 / 57

Slide 28

Slide 28 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 28 / 57

Slide 29

Slide 29 text

ه๏ D + 1 ૚ͷશ݁߹χϡʔϥϧωοτϫʔΫ ೖྗ͕ୈ 0 ૚ɺग़ྗ͕ୈ D ૚ ୈ ℓ ૚ͷૉࢠ਺Λ Nℓ ͱॻ͘ ୈ ℓ ૚ʹ఻೻͢ΔͨΊͷॏΈͱόΠΞεΛ W ℓ, bℓ (ℓ = 1, . . . , D) ͱ ॻ͘ W ℓ ∈ RNℓ×Nℓ−1 , bℓ ∈ RNℓ ֤૚ͷग़ྗΛ xℓ ∈ RNℓ ͱ͢Δͱ xℓ = ϕ(hℓ), hℓ = W ℓxℓ−1 + bℓ x0: ೖྗ ϕ: ׆ੑԽؔ਺ʢ੒෼ຖʹ࡞༻ͤ͞Δʣ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 29 / 57

Slide 30

Slide 30 text

໰୊ઃఆ ॏΈͱόΠΞεͷ֤੒෼͕ɺҎԼͷ෼෍͔ΒͦΕͧΕϥϯμϜʹબ ୒͞ΕΔͱ͢Δ ॏΈɿฏۉ 0ɺ෼ࢄ σ2 w /Nℓ−1 ͷਖ਼ن෼෍ όΠΞεɿฏۉ 0ɺ෼ࢄ σ2 b ͷਖ਼ن෼෍ ෼ࢄΛ Nℓ−1 Ͱׂ͍ͬͯΔͷ͸ɺ࣍ͷ૚ͷग़ྗͷεέʔϧΛଗ͑ΔͨΊ hℓ i = Nℓ−1 ∑ j=1 W ℓ ij xℓ−1 j + bℓ i ೖྗ x0 ͸ͲͷΑ͏ʹ఻೻͢Δ͔ʁ 2 ͭͷೖྗ x0,1 ͱ x0,2 ͷ૬ؔ͸఻೻ʹΑͬͯͲͷΑ͏ʹมԽ͢Δ͔ʁ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 30 / 57

Slide 31

Slide 31 text

1 ͭͷೖྗͷ఻೻ɿܭࢉ (1/2) ֤૚ͷૉࢠ͕ͲΕ͘Β͍ൃՐ͍ͯ͠Δ͔Λ࣍ͷྔͰఆٛ͢Δ qℓ = 1 Nℓ Nℓ ∑ i=1 ( hℓ i )2 hℓ i (i = 1, . . . , Nℓ) ͕ͭ͘Δ֬཰෼෍ʢܦݧ෼෍ʣ͸ Nℓ → ∞ Ͱਖ਼ن ෼෍ͱͳΓɺͦͷ෼ࢄ͸ qℓ ਖ਼ن෼෍͔Βͷେྔͷ࣮ݱ஋ W ℓ, bℓ Ͱ༩͑ΒΕΔ ( hℓ i )2 ͷૉࢠํ޲ ͷظ଴஋͸ɺ1 ͭͷૉࢠʹ஫໨ͨ͠ͱ͖ͷॏΈͱόΠΞεʹؔ͢Δ ظ଴஋ͱಉ͡ qℓ = ⟨( hℓ i )2 ⟩ ⟨·⟩ ͸ɺW ℓ ͱ bℓ ʹؔ͢Δظ଴஋Λද͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 31 / 57

Slide 32

Slide 32 text

1 ͭͷೖྗͷ఻೻ɿܭࢉ (2/2) W ℓ, bℓ ͸ฏۉ 0 ͷਖ਼ن෼෍ͷ࣮ݱ஋ͳͷͰɺަ߲ࠩͷظ଴஋͸͢΂ ͯফ͑Δ qℓ = ⟨( hℓ i )2 ⟩ = ⟨  Nℓ−1 ∑ j=1 W ℓ ij ϕ ( hℓ−1 j ) + bℓ i   2⟩ = σ2 w Nℓ−1 Nℓ−1 ∑ i=1 ( ϕ ( hℓ−1 i ))2 + σ2 b 1 Nℓ−1 ∑ (· · · ) ͸ܦݧ෼෍Λ༻͍ͨظ଴஋Λද͍ͯ͠Δ Nℓ−1 → ∞ Ͱ͸ɺਖ਼ن෼෍Λ༻͍ͨظ଴஋ʹஔ͖׵͑ΒΕΔ qℓ = σ2 w ∫ ∞ −∞ dh(ϕ(h))2N ( h 0, qℓ−1 ) + σ2 b = σ2 w ∫ ∞ −∞ Dz ( ϕ (√ qℓ−1z ))2 + σ2 b , Dz = dz √ 2π e−z2/2 (1) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 32 / 57

Slide 33

Slide 33 text

1 ͭͷೖྗͷ఻೻ɿ࣮ݧ (1/2) ϕ(·) = tanh(·), σb = 0.3, Nℓ = 1000 Ͱͷάϥϑʢೱ͍ઢ͸ཧ࿦஋ʣ A qℓ−1 ʹର͢Δ qℓ ͷάϥϑ B ز͔ͭͷೖྗʹର͢Δ qℓ ͷมભ qℓ−1 = qℓ ͱͳΔݻఆ఺ q∗ ͕ଘࡏʢάϥϑ A ͷˑʣ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 33 / 57

Slide 34

Slide 34 text

1 ͭͷೖྗͷ఻೻ɿ࣮ݧ (2/2) σb = 0, σw < 1 ͷ৔߹ ݻఆ఺͸ q∗ = 0 ͷΈ ৗʹ qℓ−1 > qℓ ͱͳΓɺग़ྗ͸ 0 ʹऩଋ͢Δ σb = 0, σw > 1 ͷ৔߹ ෆ҆ఆͳݻఆ఺ q∗ = 0 ͱɺ҆ఆͳݻఆ఺͕ଘࡏ͢Δ େ͖ͳೖྗ΋খ͞ͳೖྗ΋ɺ఻೻ʹΑΓ҆ఆͳ q∗ ʹۙͮ͘ σb ̸= 0 ͷ৔߹ ඇྵͷ҆ఆͳݻఆ఺ͷΈ͕ଘࡏ͢Δ σw < 1 Ͱ͋ͬͯ΋ɺόΠΞεʹΑΓग़ྗͷ 0 ΁ͷऩଋ͕๷͕ΕΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 34 / 57

Slide 35

Slide 35 text

2 ͭͷೖྗͷ఻೻ɿܭࢉ (1/4) 2 ͭͷೖྗ x0,1, x0,2 ʹର͠ɺ֤૚ͷ૬ؔΛ࣍ͷྔͰఆٛ͢Δ qℓ ab = 1 Nℓ Nℓ ∑ i=1 hℓ i ( x0,a ) hℓ i ( x0,b ) , a, b ∈ {1, 2} a = b ͷ৔߹͸ qℓ ʹͳΔͷͰɺqℓ 12 Λߟ͑Δ hℓ i ( x0,1 ) ͱ hℓ i ( x0,2 ) ͷಉ࣌ܦݧ෼෍͸ɺNℓ → ∞ Ͱ 2 ࣍ݩਖ਼ن෼෍ ͱͳΓɺڞ෼ࢄ͸ Qℓ ୈ (a, b) ੒෼͕ qℓ ab Ͱ͋ΔߦྻΛ Qℓ ͱॻ͘͜ͱʹ͢Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 35 / 57

Slide 36

Slide 36 text

2 ͭͷೖྗͷ఻೻ɿܭࢉ (2/4) 1 ͭͷೖྗͷ৔߹ͱಉ༷ʹܭࢉ͢Δͱ qℓ 12 = ⟨ hℓ i ( x0,1 ) hℓ i ( x0,2 )⟩ = ⟨  Nℓ−1 ∑ j=1 W ℓ ij ϕ ( hℓ−1 j ( x0,1 )) + bℓ i     Nℓ−1 ∑ k=1 W ℓ ik ϕ ( hℓ−1 k ( x0,1 )) + bℓ i   ⟩ = σ2 w Nℓ−1 Nℓ−1 ∑ i=1 ϕ ( hℓ−1 i ( x0,1 )) ϕ ( hℓ−1 i ( x0,2 )) + σ2 b Nℓ−1→∞ − − − − − − → σ2 w ∫ dh1dh2ϕ (h1) ϕ (h2) N ( ⃗ h ⃗ 0, Qℓ−1 ) + σ2 b = σ2 w ∫ dh1dh2ϕ (h1) ϕ (h2) 1 2π √ |Qℓ−1| exp ( − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h ) + σ2 b matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 36 / 57

Slide 37

Slide 37 text

2 ͭͷೖྗͷ఻೻ɿܭࢉ (3/4) 2 ม਺ͷಠཱͳΨ΢εੵ෼Ͱॻ͖͍ͨ ૬ؔ܎਺ cℓ = qℓ 12 / √ qℓ 11 qℓ 22 Λ༻͍Δͱɺexp ͷத͸ − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h = − 1 2 ( 1 − (cℓ−1)2 )   h2 1 qℓ−1 11 − 2cℓ−1 √ qℓ−1 11 qℓ−1 22 h1h2 + h2 2 qℓ−1 22   ม਺ม׵ h1 = √ qℓ−1 11 z1, h2 = √ qℓ−1 22 ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) Λߦ͏ͱ − 1 2 ⃗ hT ( Qℓ−1 ) −1 ⃗ h = − 1 2 ( z2 1 + z2 2 ) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 37 / 57

Slide 38

Slide 38 text

2 ͭͷೖྗͷ఻೻ɿܭࢉ (4/4) ม਺ม׵ʹΑΔϠίϏΞϯ͸ ∂ (h1, h2) ∂ (z1, z2) = √ qℓ−1 11 qℓ−1 22 [ 1 − (cℓ−1)2 ] = √ |Qℓ−1| ·ͱΊΔͱ qℓ 12 = σ2 w ∫ Dz1Dz2ϕ (h1) ϕ (h2) + σ2 b , h1 = √ qℓ−1 11 z1, h2 = √ qℓ−1 22 ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) qℓ 11 , qℓ 22 ͸े෼ૣ͘ q∗ ʹऩଋ͢ΔͷͰɺҎԼͷࣜͰݻఆ఺Ͱͷ૬ؔ ܎਺Λٞ࿦͢Δ cℓ = 1 q∗ ( σ2 w ∫ Dz1Dz2ϕ (h1) ϕ (h2) + σ2 b ) , (2) h1 = √ q∗z1, h2 = √ q∗ ( cℓ−1z1 + √ 1 − (cℓ−1)2z2 ) matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 38 / 57

Slide 39

Slide 39 text

૬ؔ܎਺ͷৼΔ෣͍ cℓ ( cℓ−1 = 1 ) = 1 ͳͷͰɺc∗ = 1 ͸ݻఆ఺ c∗ = 1 ͷ҆ఆੑ͸ɺχ1 = ∂cℓ ∂cℓ−1 cℓ−1=1 Ͱܾ·Δ χ1 < 1 ͳΒɺc∗ = 1 ͷۙ͘Ͱ cℓ−1 < cℓ ͱͳΓɺc∗ = 1 ʹऩଋ͢Δ χ1 > 1 ͳΒ c∗ = 1 ͷۙ͘Ͱ cℓ−1 > cℓ ͱͳΓɺc∗ = 1 ͔Βԕ͔͟Δ χ1 = ∂cℓ ∂cℓ−1 cℓ−1=1 = σ2 w √ q∗ ∫ Dz1Dz2ϕ (h1) ϕ′ (h2)  z1 − cℓ−1 √ 1 − (cℓ−1)2 z2   cℓ−1=1 ∫ DzF(z)z = ∫ DzF′(z) Λ༻͍Δͱ χ1 = σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) cℓ−1=1 = σ2 w ∫ Dz ( ϕ′ (√ q∗z ))2 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 39 / 57

Slide 40

Slide 40 text

૬ؔ܎਺ͷৼΔ෣͍ 1 χ1 < 1: டং૬ c∗ = 1 ͸҆ఆͳݻఆ఺ ҟͳΔೖྗͰ΋ɺ఻೻ʹΑͬͯ૬͕ؔߴ͘ͳ͍ͬͯ͘ σw ͸খ͍͞ 2 χ1 > 1: ΧΦε૬ c∗ = 1 ͸ෆ҆ఆͳݻఆ఺ ҆ఆͳݻఆ఺ c∗ < 1 ͕ଘࡏ ҟͳΔೖྗͷࠩΛ֦େ͢Δ σw ͸େ͖͍ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 40 / 57

Slide 41

Slide 41 text

ಛ௃తͳਂ͞εέʔϧͷଘࡏ ࣮ݧͯ͠ΈΔͱɺq∗ ΍ c∗ ΁ͷऩଋ͸ࢦ਺ؔ਺తʹͳ͍ͬͯΔ qℓ − q∗ ∼ e−ℓ/ξq cℓ − c∗ ∼ e−ℓ/ξc ξq, ξc ͸ͦΕͧΕɺ୯Ұͷೖྗͷେ͖͞ͱɺ2 ͭͷೖྗͷ૬͕ؔͲΕ ͚ͩਂ͍૚·Ͱऩଋͤͣʹ఻೻Ͱ͖Δ͔Λද͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 41 / 57

Slide 42

Slide 42 text

ξq ͷܭࢉ (1) ࣜʹ qℓ = q∗ + ϵℓ q Λ୅ೖ͠ɺTaylor ల։͢Δͱ ϵℓ q = ϵℓ−1 q σ2 w √ q∗ ∫ Dzϕ (√ q∗z ) ϕ′ (√ q∗z ) z + · · · = ϵℓ−1 q σ2 w √ q∗ ∫ Dz ∂ ∂z [ ϕ (√ q∗z ) ϕ′ (√ q∗z )] + · · · = ϵℓ−1 q [ χ1 + σ2 w ∫ Dzϕ (√ q∗z ) ϕ′′ (√ q∗z )] + · · · ϵℓ q ∼ e−ℓ/ξq ͱൺֱ͢Δͱ ξ−1 q = − log [ χ1 + σ2 w ∫ Dzϕ (√ q∗z ) ϕ′′ (√ q∗z )] matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 42 / 57

Slide 43

Slide 43 text

ξc ͷܭࢉ ಉ༷ʹɺ(2) ࣜʹ cℓ = c∗ + ϵℓ c Λ୅ೖ͠ɺTaylor ల։͢Δ ϵℓ c = ϵℓ−1 c σ2 w √ q∗ ∫ Dz1Dz2ϕ (h1) ϕ′ (h2)  z1 − c∗ √ 1 − (c∗)2 z2   + · · · = ϵℓ−1 c σ2 w √ q∗ ∫ Dz1Dz2   ∂ ∂z1 − c∗ √ 1 − (c∗)2 ∂ ∂z2   [ ϕ (h1) ϕ′ (h2) ] + · · · = ϵℓ−1 c σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) + · · · ϵℓ c ∼ e−ℓ/ξc ͱൺֱ͢Δͱ ξ−1 c = − log [ σ2 w ∫ Dz1Dz2ϕ′ (h1) ϕ′ (h2) ] matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 43 / 57

Slide 44

Slide 44 text

ξq , ξc ͷৼΔ෣͍ ࣮ݧʢ࣮ઢʣͱཧ࿦஋ʢഁઢʣͷൺֱ σ2 b = 0.01ʢࠇʣ͔Β σ2 b = 0.3ʢ྘ʣ·Ͱม͑ͨ݁Ռ ࣮ݧͱཧ࿦ͰΑ͘߹͍ͬͯΔ டং૬Ͱ͸ c∗ = 1 ͳͷͰ ξ−1 c = − log χ1 Ͱ͋ΓɺసҠ఺Ͱ͸ χ1 = 1 ͔ͩΒ ξc ͸ൃࢄ͢Δ ࣮ݧతʹ΋ൃࢄ͕Α͘ݟ͍͑ͯΔ σʔλؒͷ૬͕ؔऩଋͤͣʹ఻Θ͍ͬͯ͘ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 44 / 57

Slide 45

Slide 45 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 45 / 57

Slide 46

Slide 46 text

ٯ఻೻ͷฏۉ৔ཧ࿦ ͔͜͜Β͸ٯ఻೻ʹରͯ͠ಉ༷ͷٞ࿦ΛߦͬͯΈΔ ଛࣦؔ਺ E ͷඍ෼ δℓ i = ∂E/∂hℓ i ͕ͲΕ͚ͩൃՐ͍ͯ͠Δ͔Λߟ͑Δ qℓ = ⟨( δℓ i )2 ⟩ ॱ఻ൖͷ৔߹ͷ qℓ ʹରԠ qℓ ͕ඇྵ͔ͭ༗ݶʹͱͲ·ΔύϥϝʔλྖҬ͕஌Γ͍ͨ ௨ৗͷ chain rule ʹΑΓɺδℓ i ͸࣍ͷੑ࣭Λ΋ͭ δℓ i = ∂E ∂hℓ i = Nℓ+1 ∑ j=1 ∂E ∂hℓ+1 j ∂hℓ+1 j ∂hℓ i = ϕ′ ( hℓ i ) Nℓ+1 ∑ j=1 δℓ+1 j W ℓ+1 ji matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 46 / 57

Slide 47

Slide 47 text

ٯ఻೻ͷฏۉ৔ཧ࿦ɿܭࢉ ·ͣ͸ δℓ i ͷදࣜΛ୅ೖ qℓ = ⟨( δℓ i )2 ⟩ = ⟨ ϕ′ ( hℓ i ) Nℓ+1 ∑ j=1 δℓ+1 j W ℓ+1 ji   2⟩ ͜͜Ͱɺٯ఻೻ܭࢉ࣌ʹ࢖༻͢ΔॏΈ͸ɺॱ఻೻ʹ࢖༻͢ΔॏΈͱ ͸ಠཱʹਖ਼ن෼෍͔Βऔ͖͍ͬͯͯΔͱԾఆ͢Δͱɺ֤߲͕෼཭Ͱ ͖ͯ qℓ = ⟨( ϕ′ ( hℓ i ))2 ⟩ Nℓ+1 ∑ j=1 ⟨( δℓ+1 j )2 ⟩ ⟨( W ℓ+1 ji )2 ⟩ = ∫ dh ( ϕ′(h) )2 N ( h 0, qℓ ) × qℓ+1 Nℓ+1 ∑ j=1 σ2 w Nℓ qℓ→q∗ − − − − → qℓ+1 Nℓ+1 Nℓ χ1 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 47 / 57

Slide 48

Slide 48 text

ಛ௃తͳਂ͞εέʔϧ ॱ఻೻ͷ৔߹ͱಉ༷ʹɺಛ௃తͳਂ͞εέʔϧ ξ∇ ΛܭࢉͰ͖Δ ؆୯ͷͨΊ Nℓ = Nℓ+1 ͱ͢Δͱɺqℓ = qℓ+1χ1 ͳͷͰ qℓ ∼ qDe−(D−ℓ)/ξ∇ , ξ−1 ∇ = − log χ1 ॱ఻೻ͷ৔߹ʹࣔͨ͠ 2 ͭͷ૬ͱసҠ఺௚্Ͱɺޯ഑ͷৼΔ෣͍͕ େ͖͘มΘΔ டং૬ (χ1 < 1) Ͱ͸ ξ∇ > 0 ͱͳΓɺ|ξ∇ | ͘Β͍ͷٯ఻೻Ͱޯ഑͕ ফࣦ ΧΦε૬ (χ1 > 1) Ͱ͸ ξ∇ < 0 ͱͳΓɺ|ξ∇ | ͘Β͍ͷٯ఻೻Ͱޯ഑͕ ൃࢄ సҠ఺௚্ (χ1 = 1) Ͱ͸ޯ഑͕༗ݶʹͱͲ·Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 48 / 57

Slide 49

Slide 49 text

࣮ݧɿMNIST ʹ͓͚Δޯ഑ (a) ύϥϝʔλʹΑͬͯɺೖྗଆʹۙͮ͘΄Ͳޯ഑͕ൃࢄ·ͨ͸ 0 ʹऩ ଋ͍ͯ͠Δ (b) ξ∇ ͕ཧ࿦ͱ࣮ݧͰΑ͘߹͍ͬͯΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 49 / 57

Slide 50

Slide 50 text

࣮ݧɿֶशՄೳੑ ϥϯμϜωοτϫʔΫֶ͕शͰ͖ΔͨΊʹ͸ɺೖྗ͞Εͨ৘ใ͕࠷ ޙ·Ͱ఻೻͠ɺޯ഑͕ઌ಄·Ͱ఻೻͢Δ͜ͱ͕ඞཁͰ͸ͳ͍͔ L, σw Λมֶ͑ͯशͯ͠ΈΔͱɺωοτϫʔΫͷਫ਼౓ͱ ξc ͷڍಈ͕ Ұக͢Δ ৭͸ೱ͍΄Ͳֶशσʔλʹର͢Δਫ਼౓͕ߴ͍͜ͱΛද͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 50 / 57

Slide 51

Slide 51 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 51 / 57

Slide 52

Slide 52 text

Residual Network ͷฏۉ৔ཧ࿦ શ݁߹ residual network ʹର͢Δฏۉ৔ཧ࿦Λߏங [7] શ݁߹ residual network: xℓ = V ℓϕ ( hℓ ) + xℓ−1 + aℓ, hℓ = W ℓxℓ−1 + bℓ ओͳࢦඪɿ eℓ = ⟨ hℓ,1 i hℓ,2 i ⟩ √⟨( xℓ,1 i )2 ⟩ ⟨( xℓ,2 i )2 ⟩, χℓ = ⟨( ∂E ∂xℓ i )2 ⟩ ϕ(·) = tanh(·) ͷ৔߹ eℓ − e∗ ∼ ℓ−δ∗ , χm ∼ eA( √ ℓ− √ m)χℓ (A ∝ σw ) ී௨ͷωοτϫʔΫΑΓऩଋ͕஗͘ɺΑΓ৘ใ͕఻ΘΓ΍͍͢ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 52 / 57

Slide 53

Slide 53 text

Residual Network ͷฏۉ৔ཧ࿦ɿ࣮ݧ MNIST Ͱֶशͨ͠ͱ͖ͷɺςετσʔλʹର͢Δਫ਼౓ ఺ઢɿlog ( χ0/χℓ ) ∼ σw √ L ͕Ұఆ tanh ͷ৔߹ʹ͸ eℓ − e∗ ΑΓ΋ χm ͷํ͕ൃࢄ͠΍͍͢ͷͰɺχm ͷ ൃࢄΛ཈͑ΔϋΠύʔύϥϝʔλͰ͋Δ͜ͱ͕ॏཁ ޯ഑ͷେ͖͕͞มΘΒͳ͍͜ͱ͕ཧ૝ͳΒɺlog ( χ0/χℓ ) = 0 ͕ཧ૝ʁ → σw ͕খ͍͞ͱҟͳΔೖྗͷ఻೻ͷ͕ࠩ޿͕Βͣɺදݱೳྗ্͕͕ Βͣʹ͏·͍͔͘ͳ͍ͱߟ͑ΒΕΔ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 53 / 57

Slide 54

Slide 54 text

৞ΈࠐΈχϡʔϥϧωοτϫʔΫ (CNN) ͷฏۉ৔ཧ࿦ CNN ʹର͢Δฏۉ৔ཧ࿦Λߏங [8] पظڥք৚݅Λ΋ͭ 1 ࣍ݩ CNN νϟϯωϧํ޲͸े෼େ͖͍ͱߟ͑Δ͕ɺۭؒํ޲͸༗ݶͰΑ͍ 2 ࣍ݩ CNN ʹର͢Δ৽͍͠ॏΈॳظԽํ๏ΛఏҊ͠ɺ10,000 ૚ͷ CNN ͷֶशʹ੒ޭ ૚Λਂ͘͢Δ͚ͩͰ͸಄ଧͪʹͳ͍ͬͯΔͷͰɺresidual connection ΍ batch normalization ͷΑ͏ͳߏ଄ֶ͕शͷޮ཰ੑҎ֎ͷ؍఺͔Β΋ ॏཁ͔΋͠Εͳ͍ ࠨɿMNISTɺӈɿCIFAR-10 matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 54 / 57

Slide 55

Slide 55 text

Batch normalization (batchnorm) ͷฏۉ৔ཧ࿦ Batchnorm ΛऔΓೖΕͨશ݁߹ωοτϫʔΫʹର͢Δฏۉ৔ཧ࿦ [9] Batchnorm Λಋೖ͢Δͱɺޯ഑͕ඞͣൃࢄ͢Δ → ֶशՄೳͳωοτϫʔΫͷਂ͞ʹ্ݶ͕͋Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 55 / 57

Slide 56

Slide 56 text

໨࣍ 1 ͸͡Ίʹ 2 ݯྲྀͱͳͬͨཧ࿦ ෺ཧֶʹ͓͚Δฏۉ৔ཧ࿦ ਆܦճ࿏໢Ϟσϧͷฏۉ৔ཧ࿦ 3 ਂ૚ֶशͷฏۉ৔ཧ࿦ ॱ఻ൖͷฏۉ৔ཧ࿦ ٯ఻ൖͷฏۉ৔ཧ࿦ ͦͷޙͷൃల 4 ·ͱΊ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 56 / 57

Slide 57

Slide 57 text

·ͱΊ ਂ૚ֶशʹ͓͚Δฏۉ৔ཧ࿦ͱɺͦͷݯྲྀͱͳͬͨཧ࿦Λ঺հͨ͠ ਂ૚ֶशʹ͓͚Δฏۉ৔ཧ࿦Ͱ͸ɺॏΈΛϥϯμϜʹ͢Δ͜ͱʹΑ Γɺೖྗɺ૬ؔɺޯ഑ͳͲͷ఻೻Λٞ࿦͢Δ ૚਺΍׆ੑԽؔ਺ʹґଘ͠ͳ͍࿮૊Έ ΦϦδφϧͷฏۉ৔ཧ࿦ͱ͸ҙຯ͕ҧ͖͍ͬͯͯΔ ॏΈॳظԽͳͲͷϋΠύʔύϥϝʔλʹରͯ͠ఆྔతͳࣔࠦΛ༩ ͑Δ matsuno ਂ૚ֶशͷฏۉ৔ཧ࿦ 2019 ೥ 7 ݄ 20 ೔ 57 / 57