Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
PRML(ニューラルネット編)
gucchi
September 20, 2019
Science
0
150
PRML(ニューラルネット編)
gucchi
September 20, 2019
Tweet
Share
More Decks by gucchi
See All by gucchi
gucchi
2
200
gucchi
2
230
gucchi
1
160
gucchi
3
200
gucchi
2
190
gucchi
1
170
gucchi
1
270
gucchi
1
320
gucchi
1
19
Other Decks in Science
See All in Science
sansandsoc
1
340
masayamoriofficial
0
470
pacmannai
1
150
pacocat
0
1.6k
sshimizu2006
0
130
asei
0
350
shuntaros
0
500
itakeshi
3
1.3k
kamakiri1225
0
680
sansandsoc
2
1.1k
shiftpn
0
2.3k
sansandsoc
3
590
Featured
See All Featured
sferik
609
54k
qrush
285
18k
holman
448
130k
bermonpainter
342
26k
colly
66
3k
davidbonilla
69
3.5k
reverentgeek
168
7.1k
philhawksworth
192
8.8k
eileencodes
113
25k
reverentgeek
27
1.9k
destraynor
223
47k
notwaldorf
13
1.5k
Transcript
PRML ΛࡐʹػցֶशΛਂ͘ཧղ͢Δηϛφʔ ʲχϡʔϥϧωοτฤʳ ࡔޱ ྒี 1 / 38
0. ࠓճͷηϛφʔʹ͍ͭͯ ࠓճͷηϛφʔͰɺPRML ͷୈ 5 ষͷχϡʔϥϧωοτΛத৺ʹ͓ ͍ͨ͠͠ͱࢥ͍·͢ɻ ·ͨࠓճͷηϛφʔͰɺਂֶशͷຊʹΑ͘ॻ͍ͯ͋Δχϡʔϥϧ ωοτΛϊʔυͱΤοδ (ล)
Λ༻͍ͨάϥϑͰදݱ͢Δ͜ͱ͔Β࢝Ί ͯɺߦྻԋࢉͷޡࠩؔͷΛ͠ɺٯޡࠩൖ๏ͷઆ໌ʹҠΔྲྀΕ ͷ͠ͳ͍ɻ χϡʔϥϧωοτΛઢܗճؼϞσϧ (PRML 3 ষ) ϩδεςΟοΫճ ؼ (PRML 4 ষ) Λ֦ுͨ͠Ϟσϧͱͯ͠ಋೖ͢ΔΛ͢Δɻ(εϥΠ υ 2 ষ) ಋೖޙɺχϡʔϥϧωοτͷॏΈͷରশੑ (εϥΠυ 3 ষ) ଛࣦؔ ͱਖ਼ଇԽͷ (εϥΠυ 4 ষ) Λߦ͏ɻ ͦͷͨΊɺઢܗճؼϞσϧϩδεςΟοΫճؼطͱ͠·͢ɻ ͳ͓ҙͱͯ͠ɺຊεϥΠυͷࣜ൪߸ͱ PRML ͷࣜ൪߸ҟͳΓ· ͢ͷͰɺ͝ҙ͍ͩ͘͞ɻ 2 / 38
࣍ 1. ಋೖ 2. χϡʔϥϧωοτϫʔΫؔ (PRML 5.1) 3. ॏΈͷۭؒରশੑ (PRML
5.1.1) 4. ଛࣦؔͱਖ਼ଇԽ (PRML 5.2, 1.2.5) 3 / 38
1. ಋೖ ·ͣɺຊεϥΠυશମΛ௨ͯ͠ɺ܇࿅σʔλͷೖྗϕΫτϧͷू߹Λ {x1 , x2 , · · ·
, xN } ͱॻ͖ɺೖྗϕΫτϧ xn D ࣍ݩͷϕΫτϧͱ ͢Δɻ ·ͨɺͦͷೖྗϕΫτϧʹରԠ͢ΔඪϕΫτϧͷू߹Λ {t1 , t2 , · · · , tN } ͱॻ͖ɺtn K ࣍ݩͷϕΫτϧͱ͢Δɻ (χϡʔϥϧωοτʹݶΒͣ) ڭࢣ͋ΓػցֶशͰͷզʑͷత༻ҙ ͨ͠܇࿅σʔλΛ༻͍ͯɺೖྗσʔλ͔ΒඪϕΫτϧΛ༧ଌ͢Δؔ y(x) Λ࡞ͬͯɺະͷσʔλ x ͷඪϕΫτϧ t Λ y(x) Ͱ༧ଌ͢Δ͜ ͱͰ͋Δɻ 4 / 38
1. ಋೖ ࣮ͨͩ͠ࡍɺ܇࿅σʔλΛͬͯ༧ଌؔ y(x) ΛҰ͔Β࡞Γ্͛Δ ͜ͱ͠ͳ͍ɻ PRML ͷ 3 ষ
(ઢܗճؼ) ͰɺK = 1 ͱͯ͠ɺҎԼͷΑ͏ͳܗΛͨ͠ ؔ y(x, w) y(x, w) = w0 + M−1 ∑ j=1 wj ϕj (x) = wTϕ(x) (1.1) ʹݶఆͯٞ͠Λͨ͠ɻ ͜͜Ͱɺw = (w0 , w1 , · · · , wM−1 )T ύϥϝʔλϕΫτϧͰ͋Δɻ ؔ y(x) ΛҰ͔Β࡞ΔΘΓʹɺ܇࿅σʔλΛͬͯύϥϝʔλϕΫ τϧ w Λௐઅ (w = w⋆) ͠ɺඪมͷ༧ଌؔ y(x) ͱͯ͠ɺ y(x, w = w⋆) Λ༻͢Δɻ 5 / 38
1. ಋೖ ͪͳΈʹɺಛϕΫτϧͱݺΕΔϕΫτϧؔ ϕ(x) ϕ(x) = (ϕ0 (x), ϕ1
(x), · · · , ϕM−1 (x))T ͱఆٛ͞Εɺϕ0 (x) = 1ɺͦΕҎ ֎ͷ ϕj (x) (j = 1, · · · , M − 1) Կ͔͠Βͷඇઢܗͳؔ (جఈؔ) Ͱ͋Δɻ ྫ͑ɺجఈؔͷྫͱͯ͠Ψεجఈ͕ؔ͋Δɻ ϕj (x) = exp { − (x − µj )2 2s2 } (1.2) ͜ͷجఈؔ x = µj Λத৺ʹͯ͠ɺࢄ s2 ʹΑͬͯࢧ͞ΕΔ͕ ΓΛ࣋ͭΨεجఈؔͰ͋Δɻ 6 / 38
1. ಋೖ ҰํɺPRML ͷ 4 ষͰٞͨ͠ϩδεςΟοΫճؼͰɺK = 1 ͱ͠ ͯɺҎԼͷΑ͏ͳܗΛͨؔ͠
y(x, w) y(x, w) = σ(wTϕ(x)) (1.3) ʹݶఆͯٞ͠Λͨ͠ɻ ͜͜Ͱɺσ(x) ϩδεςΟοΫγάϞΠυؔͱݺΕɺҎԼͰఆٛ ͞ΕΔɻ σ(x) = 1 1 + e−x (1.4) ਤͰॻ͘ͱҎԼͷΑ͏ʹͳΔɻ 7 / 38
1. ಋೖ ճؼͰɺ༧ଌؔ y(x) Λͦͷ··ඪมͷ༧ଌ݁Ռʹ͑Δ͕ɺ ྨͰ͋ΔϩδεςΟοΫճؼͰɺ͋ΔೖྗϕΫτϧ x ͕༩͑ ΒΕͨ࣌ʹ y(x)
≥ 0 Ͱ͋Ε x Ϋϥε 1 ʹॴଐ͠ (t = 1)ɺy(x) < 0 Ͱ͋Ε x Ϋϥε 2 ʹॴଐ͢Δ (t = 0) ͱ͢Δɻ ·ͱΊΔͱɺઢܗճؼͰϩδεςΟοΫճؼͰ༧ଌؔ y(x) ΛҎ ԼͷΑ͏ͳಛఆͷܗʹԾఆ͓͍ͯͯ͠ɺ y(x, w) = f(wTϕ(x)) (1.5) ܇࿅σʔλΛ༻͍ͯɺύϥϝʔλ w Λௐઅ͢ΔࣄʹΑΓɺ༧ଌؔ y(x) Λੜͨ͠ɻ ͜͜Ͱɺؔ f(·) ҙͷඇઢܗؔͰ͋Δɻ(ઢܗճؼͷ࣌߃ؔ ɺϩδεςΟοΫճؼͷ࣌ϩδεςΟοΫγάϞΠυؔΛ༻ ͨ͠ɻ) ϕΫτϧؔ ϕ(x) ΛಛఆͷؔʹऔΔ͜ͱͰϞσϧ͕χϡʔϥϧωο τϫʔΫϞσϧʹͳΔɻ 8 / 38
2. χϡʔϥϧωοτϫʔΫؔ ͜Ε·ͰͷٞͰɺઢܗճؼϩδεςΟοΫճؼ༧ଌؔ y(x, w) y(x, w) = f(wTϕ(x))
(2.1) ͷؔͷܗΛԾఆ͢Δ͜ͱΛઆ໌ͨ͠ɻ ۩ମྫͱͯ͠ɺϕ(x) ϕ(x) = (ϕ0 (x), ϕ1 (x), · · · , ϕM−1 (x))T Ͱఆٛ͞ Ε͍ͯͯɺϕ0 (x) = 1 ͱ͠ɺͦΕҎ֎ͷ ϕj (x) (j = 1, · · · , M − 1) Ҏ ԼͷΑ͏ʹΨεجఈؔͱԾఆ͢Δํ๏͕͋Δɻ ϕj (x) = exp { − (x − µj )2 2s2 } (2.2) ͜ͷΨεجఈؔͷύϥϝʔλ µj (j = 1, · · · , M − 1) ͱ s2 ύϥ ϝʔλɺ܇࿅σʔλΛ༻͍ͯௐઅ͞ΕΔύϥϝʔλ w ͱҟͳΓɺ y(x, w) ͷܗΛܾΊΔ࣌ʹखಈͰܾΊΔϋΠύʔύϥϝʔλͰ͋Δɻ (͜Εֶ͕͠शύϥϝʔλͰ͋ͬͨΒɺ ʮઢܗʯճؼͰͳ͘ͳΔ) 9 / 38
2. χϡʔϥϧωοτϫʔΫؔ χϡʔϥϧωοτͰɺಛϕΫτϧ ϕ(x) ֶ͕ࣗशύϥϝʔλʹґ ଘ͢ΔΑ͏ʹબͿɻ ύϥϝʔλʹ͍ͭͯɺΨεجఈؔͷ࣌ͷ µj (j =
1, · · · , M − 1) ͱ s2 ͱಉ͡Α͏ʹجఈؔ ϕj (x) (j = 0, · · · , M − 1) ͦΕͧΕʹಠཱ ͳύϥϝʔλ w(1) j Λ༻ҙ͢Δɻ ·ͨɺ͜ΕΒͷύϥϝʔλ w(1) j (ॎϕΫτϧ) Λసஔͯ͠ɺॎʹฒͨ ҎԼͷΑ͏ͳߦྻ W(1) Λߟ͑Δɻ W(1) = ( w(1) 0 , w(1) 1 , · · · , w(1) M−1 )T (2.3) ಛϕΫτϧ ϕ(x) ߦྻ W(1) ґଘ͓ͯ͠Γɺϕ(x; W(1)) ͱ͔͘͜ͱ ʹ͢Δɻ 10 / 38
2. χϡʔϥϧωοτϫʔΫؔ ֶशύϥϝʔλʹґଘͨ͠ϕΫτϧؔ ϕ(x; W(1)) Λ༻͍Δͱɺ༧ଌ ؔ y(x, w) ҎԼͷΑ͏ʹͳΔɻ
y(x, w) = f ( w(2)T ϕ(x; W(1)) ) (2.4) ͜͜Ͱɺw ύϥϝʔλϕΫτϧ w(2) ͱ W(1) Λ߹Θͤͨશͯͷύϥ ϝʔλΛҙຯ͠ɺͭ·Γ w(2) w ͷதͰ W(1) Ҏ֎ͷύϥϝʔλͰ ͋Δɻ ͜͜ͰɺಛϕΫτϧ ϕ(x; W(1)) Λɺh(x) ΛͳΜΒ͔ͷඇઢܗؔͱ ͯ͠ɺҎԼͷܗʹݶఆ͢Δɻ ϕ(x; W(1)) =h ( W(1)x ) = ( h ( D ∑ i=0 w(1) 0i xi ) , h ( D ∑ i=0 w(1) 1i xi ) , · · · , h ( D ∑ i=0 w(1) M−1,i xi )) T (2.5) ߦྻ W(1) ͷ (j, i) Λ w(1) ji ͱॻ͘͜ͱʹ͢Δɻ 11 / 38
2. χϡʔϥϧωοτϫʔΫؔ ͜͜ͰɺεΧϥʔͷҾΛ࣋ͭؔ h(x) ʹϕΫτϧͷҾΛ༩͑Δͱɺ ҎԼͷΑ͏ʹҾΛಉ࣍͡ݩͷϕΫτϧΛฦ͢ͱ͢Δɻ h(a) = (h(a1 ),
h(a2 ), · · · , h(aD ))T (2.6) (2.5) ͷΑ͏ʹϕΫτϧؔͷܗΛݶఆͨ͠Βɺ༧ؔ y(x, w) த ؒ 1 ͭͰग़ྗϢχοτ 1 ͭͰதؒͱग़ྗͷ׆ੑԽ͕ؔͦΕͧ Ε h ͱ f Ͱ͋ΔχϡʔϥϧωοτϫʔΫؔͱͳΔ͜ͱ͕Θ͔Δɻ y(x, w) = f ( w(2)T h ( W(1)x )) (2.7) 12 / 38
2. χϡʔϥϧωοτϫʔΫؔ ͞ΒʹҰൠԽͱͯ͠ɺ༧ଌؔ y(x, w) Λ K ͷϕΫτϧ༧ଌؔ y(x, w)
ʹ֦ு͠ɺy(x, w) ͷ k Λ yk (x, w) ͱॻ͘ɻ ͜ͷҰൠԽχϡʔϥϧωοτͷग़ྗϢχοτͷΛ 1 ͔ͭΒ K ݸ ͷ֦ுʹରԠ͢Δɻ ͜ͷ࣌ɺॏΈύϥϝʔλ (2.7) ʹؚ·ΕΔ w(2) ϕΫτϧ༧ଌؔ y(x, w) ͷ͝ͱʹಠཱͨ͠ύϥϝʔλ w(2) k Λ༻ҙ͢ΔͱɺϕΫτ ϧؔ y(x, w) ͷ k yk (x, w) = f ( w(2) k T h ( W(1)x )) (2.8) ͱͳΔɻ 13 / 38
2. χϡʔϥϧωοτϫʔΫؔ W(1) ͱಉ͡Α͏ʹɺw(2) k (ॎϕΫτϧ) Λసஔͯ͠ɺॎʹฒͨҎԼͷ Α͏ͳߦྻ W(2) W(2)
= ( w(2) 1 , w(2) 2 , · · · , w(1) K )T (2.9) Λߟ͑ΔͱɺϕΫτϧؔ y(x, w) ҎԼͷΑ͏ʹͳΓɺ͜Εதؒ 1 ͭͰग़ྗϢχοτ K ݸͷχϡʔϥϧωοτϫʔΫؔͱͳΔɻ y(x, w) = f ( W(2)h ( W(1)x )) (2.10) ߦྻ W(1) ͷ (j, i) Λ w(1) ji ɺߦྻ W(2) ͷ (k, j) Λ w(2) kj ͱ͢Δ ͱɺ༧ଌؔ yk (x, w) ҎԼͷΑ͏ͳ (ݟ׳Εͨ) ܗʹͳΔɻ yk (x, w) = f ( M−1 ∑ j=0 w(2) kj h ( D ∑ i=0 w(1) ji xi )) (2.11) 14 / 38
3. ॏΈͷۭؒରশੑ ࣍ʹॏΈύϥϝʔλͷۭؒରশੑʹ͍ͭͯઆ໌͢Δɻ ͜͜ͰɺχϡʔϥϧωοτϫʔΫؔͷ׆ੑԽؔ f ͱ h ΛͦΕͧ ΕϩδεςΟοΫγάϞΠυؔͱλϯδΣϯτϋΠύϘϦοΫؔ ͱ͠ɺҎԼͷΑ͏ͳؔΛߟ͑Δɻ
y(x, w) = σ ( W(2) tanh ( W(1)x )) (3.1) ͜͜ͰɺλϯδΣϯτϋΠύϘϦοΫؔҎԼͷΑ͏ͳؔͰ͋Δɻ tanh(x) = ex − e−x ex + e−x (3.2) 15 / 38
3. ॏΈͷۭؒରশੑ λϯδΣϯτϋΠύϘϦοΫͷॏཁͳੑ࣭ͱͯ͠ɺحؔੑ͕͋Δɻ tanh(−x) = e−x − e−(−x) e−x +
e−(−x) = − ex − e−x ex + e−x = − tanh(x) (3.3) ·ͨɺߦྻΛΘͳ͍Ͱॻ͘ͱɺy(x, w) ͷ k yk (x, w) yk (x, w) = σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) (3.4) ͱͳΔɻ 16 / 38
3. ॏΈͷۭؒରশੑ ͜͜Ͱ (3.4) ͷӈลͰɺj = 1 ͷશͯͷ i ʹରͯ͠
w(1) j(=1)i → −w(1) j(=1)i ͱ͍͏ූ߸సͷมΛߦͬͯΈΔɻ ͢Δͱɺ(3.4) ͷӈล yk (x, w) =σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) =σ ( w(2) k0 tanh ( D ∑ i=0 w(1) 0i xi ) + w(2) k1 tanh ( D ∑ i=0 w(1) 1i xi ) + · · · ) →σ ( w(2) k0 tanh ( D ∑ i=0 w(1) 0i xi ) − w(2) k1 tanh ( D ∑ i=0 w(1) 1i xi ) + · · · ) (3.5) ͱมԽ͢Δɻ Αͬͯɺશͯͷ i ʹରͯ͠ w(1) 1i → −w(1) 1i ͳΔมΛߦͬͯɺಉ࣌ʹ શͯͷ k ʹରͯ͠ w(2) k1 → −w(2) k1 ͱ͍͏มԽΛߦ͑ɺؔ yk (x, w) ෆมʹอͨΕΔɻ 17 / 38
3. ॏΈͷۭؒରশੑ j j = 0, 1, · ·
· , M − 1 ͷ M ݸͷΛͱΔͷͰɺ͋Δ j ʹର͢Δ {(w(1) ji , w(2) kj )}i,k → {(−w(1) ji , −w(2) kj )}i,k ͳΔؔ yk (x, w) Λෆมʹ͢ Δม M ݸଘࡏ͢Δɻ ͜ΕΑΓɺֶशʹΑͬͯ࠷దԽ͞ΕͨॏΈ W(1), W(2) ͕ಘΒΕͨ࣌ɺ ҙͷೖྗʹ͓͍ͯՁͳग़ྗ yk (x, w) Λ༩͑ΔॏΈɺॏΈ W(1), W(2) ΛؚΊͯ 2M ݸଘࡏ͢Δ͜ͱ͕Θ͔Δɻ 18 / 38
3. ॏΈͷۭؒରশੑ ·ͨɺ͏Ұछྨͷରশੑͱͯ͠ɺؔ yk (x, w) yk (x, w) =
σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) (3.6) ͷ͋Δ j = j1 ͷॏΈͷू߹ {(w(1) j1i , w(2) kj1 )}i,k ͱ j = j2 ͷॏΈͷू߹ {(w(1) j2i , w(2) kj2 )}i,k ΛೖΕସ͑ͨͱͯ͠ɺҙͷೖྗ x Ͱग़ྗ yk (x, w) มԽ͠ͳ͍ɻ(ަରশੑ) ͜Εɺ(3.6) ͷӈลͷ j ͷͷॱংΛม͑Δ͜ͱʹ૬͢Δɻ ͭ·ΓɺֶशʹΑͬͯ࠷దԽ͞ΕͨॏΈ W(1), W(2) ͕ಘΒΕͨ࣌ɺ͜ ͷަෆมੑʹΑΓɺҙͷೖྗʹ͓͍ͯՁͳग़ྗ yk (x, w) Λ༩͑ ΔॏΈɺॏΈ W(1), W(2) ΛؚΊͯ M! ݸଘࡏ͢Δ͜ͱ͕Θ͔Δɻ 19 / 38
3. ॏΈͷۭؒରশੑ ූ߸సରশੑͱަରশੑΛ߹ΘͤΔͱɺֶशʹΑͬͯ࠷దԽ͞Εͨ ॏΈ W(1), W(2) ͕ಘΒΕͨ࣌ɺ͜ͷަෆมੑʹΑΓɺҙͷೖྗʹ ͓͍ͯՁͳग़ྗ yk (x,
w) Λ༩͑ΔॏΈɺॏΈ W(1), W(2) ΛؚΊͯ 2M · M! ݸଘࡏ͢Δ͜ͱ͕Θ͔Δɻ 20 / 38
4. ଛࣦؔͱਖ਼ଇԽ Ұൠతʹதؒ 1 ͭͷχϡʔϥϧωοτϫʔΫͷ k ݸͷϢχοτͷ ग़ྗ yk (x,
w) = f ( M−1 ∑ j=0 w(2) kj h ( D ∑ i=0 w(1) ji xi )) (4.1) Ͱ༩͑ΒΕΔ͜ͱ͕Θ͔ͬͨɻ ͜͜Ͱɺؔ h ͱ f ׆ੑԽؔͱݺΕΔඇઢܗؔͰ͋Γɺw(1) ji ͱ w(2) kj ֤ͷॏΈͰ͋Δɻ ܇࿅σʔλͷೖྗϕΫτϧͷू߹Λ {x1 , x2 , · · · , xN } ͱॻ͖ɺͦͷೖ ྗϕΫτϧʹରԠ͢ΔඪϕΫτϧͷू߹Λ {t1 , t2 , · · · , tN } ͱॻ͘ ͱɺΑ͘ߦΘΕΔύϥϝʔλͷ࠷దԽͷํ๏ͱͯ͠ɺճؼͷ࣌ʹҎԼͷ ೋޡࠩΛ࠷খʹ͢ΔΑ͏ʹύϥϝʔλΛܾΊΔํ๏͕͋Δɻ E(w) = 1 2 N ∑ n=1 ∥y(xn , w) − tn ∥2 (4.2) ͜͜Ͱɺy(x, w) = (y1 (x, w), y2 (x, w), · · · , yK (x, w))T Ͱ͋Δɻ 21 / 38
4. ଛࣦؔͱਖ਼ଇԽ χϡʔϥϧωοτϫʔΫͷग़ྗ yk (x, w) Λ֬తʹղऍ͢Δͱɺೋ ޡࠩͷ࠷খԽ࠷ਪఆͷ݁ՌͰ͋Δ͜ͱ͕Θ͔Δɻ ͜͜Ͱɺ؆୯ͷͨΊχϡʔϥϧωοτͷग़ྗϢχοτͷ 1
ͭͰ͋ Δ࣌ͷ͜ͱΛߟ͑Δɻ y(x, w) = f ( M−1 ∑ j=0 w(2) j h ( D ∑ i=0 w(1) ji xi )) (4.3) ·ͣճؼ͔Β࢝ΊΔɻͭ·Γɺඪม {t1 , t2 , · · · , tN } ͦΕ ͧΕ࿈ଓతͳΛ࣋ͭɻ ճؼͰɺ׆ੑԽؔ f ͱ h ΛͦΕͧΕ߃ؔͱλϯδΣϯτϋΠ ύϘϦοΫؔͱ͢Δɻ y(x, w) = M−1 ∑ j=0 w(2) j tanh ( D ∑ i=0 w(1) ji xi ) (4.4) 22 / 38
4. ଛࣦؔͱਖ਼ଇԽ ·ͣɺԾఆͱͯ͠ɺ܇࿅σʔλͷೖྗ {x1 , x2 , · · ·
, xN } ͕ͳΜΒ͔ͷํ ๏Ͱੜ͞Ε (αϯϓϦϯά๏ͷٞ PRML 11 ষ)ɺͦͷೖྗϕΫτ ϧʹରԠ͢Δඪม {t1 , t2 , · · · , tN } ҎԼͷฏۉ͕ग़ྗ y(x, w) Ͱ ͋ΔΨεͰͦΕͧΕಠཱʹੜ͞ΕΔͱ͢Δɻ p(t|x, w, β) = N(t|y(x, w), β−1) (4.5) ͜͜Ͱɺw, β ֶ͕शʹΑͬͯௐઅ͞ΕΔύϥϝʔλͰ͋Δɻ 23 / 38
4. ଛࣦؔͱਖ਼ଇԽ ΨεҎԼͰఆٛ͞ΕΔɻ(ύϥϝʔλฏۉ µ ͱࢄ σ2 ͷ 2 ͭ) N(x|µ,
σ2) = 1 (2πσ2)1/2 exp { − 1 2σ2 (x − µ)2 } (4.6) ճؼͷ߹֬ม࿈ଓมͳͷͰɺ͜ͷΨΠεͷԾఆऔΓ ͏Δͷൣғʹؔͯࣗ͠વͰ͋Δɻ(ྨͰผͷΛԾఆ ͢Δɻ) 24 / 38
4. ଛࣦؔͱਖ਼ଇԽ ܇࿅σʔλ (4.5) ͔Βಠཱʹੜ͞ΕΔͷͰɺؔҎԼͷΑ͏ ʹͦΕͧΕͷσʔλͷੵͰ͔͚Δɻ p(t|X, w, β) =
N ∏ n=1 N(tn |y(xn , w), β−1) (4.7) ͜ͷؔΛ࠷େʹ͢Δ w, β ΛٻΊΔ͜ͱΛߟ͑Δɻ(࠷ਪఆ๏) ͦ͜Ͱɺp(t|X, w, β) Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔΘΓʹ ؔͷରΛ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔɻ 25 / 38
4. ଛࣦؔͱਖ਼ଇԽ ·ͣɺ ln { N(tn |y(xn , w), β−1)
} = ln [ β1/2 (2π)1/2 exp { − β 2 (tn − y(xn , w))2 }] = 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 (4.8) ΑΓɺln p(t|X, w, β) ҎԼͷΑ͏ʹͳΔɻ ln p(t|X, w, β) = N ∑ n=1 ln N(tn |y(xn , w), β−1) = N ∑ n=1 [ 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 ] = N 2 ln β − N 2 ln (2π) − β 2 N ∑ n=1 (tn − y(xn , w))2 (4.9) 26 / 38
4. ଛࣦؔͱਖ਼ଇԽ ͜͜Ͱɺೋޡࠩ E(w) Λ E(w) = 1 2 N
∑ n=1 (tn − y(xn , w))2 (4.10) ͱఆٛ͢Δͱɺln p(t|X, w, β) ln p(t|X, w, β) = N 2 ln β − N 2 ln (2π) − E(w) (4.11) ͱͳΔɻ ࠷ਪఆղ wML , βML ΛٻΊΔͨΊʹର ln p(t|X, w, β) ͷޯ ΛٻΊΔɻ ରͷ w ʹର͢Δޯ β ʹґଘ͠ͳ͍ͷͰɺઌʹ wML ΛٻΊ ͯɺͦͷ͋ͱʹ ln p(t|X, wML , β) Λ༻͍ͯ βML ΛٻΊΔ͜ͱ͕Ͱ ͖Δɻ 27 / 38
4. ଛࣦؔͱਖ਼ଇԽ ·ͣɺର (4.11) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱΛߟ͑Δͱɺ (4.11) ͷӈลͷ 1,
2 ߲ w ʹґଘ͠ͳ͍ͷͰɺ3 ߲ͷ −βED (w) Λ࠷େԽ͢Δ͜ͱͱՁͰ͋Δɻ β > 0 ΑΓɺର (4.11) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱೋޡ ࠩ ED (w)(4.10) Λ w ʹؔͯ͠࠷খʹ͢Δ͜ͱͱՁͰ͋Δɻ ͜ΕΑΓɺೋޡࠩͷ࠷খԽ֬Λ༻͍ΔͱؔΛΨε ͱԾఆͨ͠ͱ͖ͷ࠷ਪఆͷ݁ՌͰ͋Δࣄ͕Θ͔Δɻ ࣮ࡍͷ࠷খԽ (͝ଘͷ௨Γ) ٯޡࠩൖ๏ͳͲΛ༻͍ͯ෮తʹ࣮ ࢪ͢Δɻ 28 / 38
4. ଛࣦؔͱਖ਼ଇԽ ࣍ʹྨΛऔΓѻ͏ɻͭ·Γɺඪม {t1 , t2 , · · ·
, tN } ͕ࢄత ͳΛ࣋ͪɺ0 ͔ 1 ͷ 2 ΛऔΓ͏Δͱ͢Δɻ ྨͰɺ׆ੑԽؔ f ͱ h ΛͦΕͧΕϩδεςΟοΫγάϞΠ υؔͱλϯδΣϯτϋΠύϘϦοΫؔͱ͢Δɻ y(x, w) = σ ( M−1 ∑ j=0 w(2) j tanh ( D ∑ i=0 w(1) ji xi )) (4.12) ग़ྗͷ׆ੑԽؔΛϩδεςΟοΫγάϞΠυؔʹ͍ͯ͠ΔͷͰɺ y(x, w) 0 < y(x, w) < 1 ͷൣғʹΛͱΔɻ 29 / 38
4. ଛࣦؔͱਖ਼ଇԽ ྨͰԾఆͱͯ͠ɺ܇࿅σʔλͷೖྗ {x1 , x2 , · · ·
, xN } ͕ͳΜ Β͔ͷํ๏Ͱੜ͞Ε (αϯϓϦϯά๏ͷٞ PRML 11 ষ)ɺͦͷೖ ྗϕΫτϧʹରԠ͢Δඪม {t1 , t2 , · · · , tN } ҎԼͷϕϧψʔΠ ͰͦΕͧΕಠཱʹੜ͞ΕΔͱ͢Δɻ p(t|x, w) = (y(x, w))t(1 − y(x, w))1−t (4.13) ͜͜Ͱɺw ֶ͕शʹΑͬͯௐઅ͞ΕΔύϥϝʔλͰ͋Δɻ t = 1 ͷ֬ y(x, w) ͱͳΓɺt = 0 ͷ֬ 1 − y(x, w) ͱͳΔɻ 0 < y(x, w) < 1 ʹΛͱΔͷͰɺͲͪΒͱ֬ͷऔΓ͏Δͷൣғ ͷ݅Λຬͨ͢ɻ 30 / 38
4. ଛࣦؔͱਖ਼ଇԽ ܇࿅σʔλ (4.5) ͔Βಠཱʹੜ͞ΕΔͷͰɺؔҎԼͷΑ͏ ʹͦΕͧΕͷσʔλͷੵͰ͔͚Δɻ p(t|X, w) = N
∏ n=1 (y(xn , w))tn (1 − y(xn , w))1−tn (4.14) ͜ͷؔΛ࠷େʹ͢Δ w ΛٻΊΔ͜ͱΛߟ͑Δɻ p(t|X, w) Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔΘΓʹؔͷ ରΛ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔɻ 31 / 38
4. ଛࣦؔͱਖ਼ଇԽ ln p(t|X, w) ҎԼͷΑ͏ʹͳΔɻ ln p(t|X, w) =
N ∑ n=1 ln { (y(xn , w))tn (1 − y(xn , w))1−tn } = N ∑ n=1 {tn ln y(xn , w) + (1 − tn ) ln (1 − y(xn , w))} = − E(w) (4.15) ͜͜ͰɺE(w) ަࠩΤϯτϩϐʔޡࠩͰ͋Δɻ E(w) = − N ∑ n=1 {tn ln y(xn , w) + (1 − tn ) ln (1 − y(xn , w))} (4.16) ͜ΕΑΓɺަࠩΤϯτϩϐʔޡࠩͷ࠷খԽɺ֬Λ༻͍Δͱɺ ؔΛϕϧψʔΠͱԾఆͨ͠ͱ͖ͷ࠷ਪఆͷ݁ՌͰ͋Δࣄ͕Θ ͔Δɻ 32 / 38
4. ଛࣦؔͱਖ਼ଇԽ Λճؼʹ͢ͱɺճؼͰҎԼͷೋޡࠩΛ࠷খԽ͢ΔΑ͏ʹ ύϥϝʔλ w ΛܾΊΔͷͰ͋ͬͨɻ E(w) = 1 2
N ∑ n=1 (tn − y(xn , w))2 (4.17) Α͘ΒΕ͍ͯΔݱͱͯ͠ɺχϡʔϥϧωοτͷΑ͏ͳෳࡶͳϞσϧ Ͱσʔλ͕গͳ͍࣌ɺύϥϝʔλ͕܇࿅σʔλʹ fit ͗͢͠Δͱ͍͏ աֶशͱݺΕΔݱ͕͋Δɻ Ұൠతʹաֶश͕ى͍ͬͯ͜Δͱ͖ɺύϥϝʔλͷͷͷઈର ͕େ͖͘ͳΔʹ͋ΔͨΊɺաֶशΛ͙ͨΊʹೋޡࠩʹҎԼͷ Α͏ͳ߲ΛՃ͑ͨਖ਼ଇԽ͞ΕͨೋޡࠩͰֶशΛߦ͏͜ͱ͕Α͘ ͋Δɻ E(w; λ) = 1 2 N ∑ n=1 (tn − y(xn , w))2 + λ 2 ∥w∥2 (4.18) 33 / 38
4. ଛࣦؔͱਖ਼ଇԽ ͜͜Ͱɺλ ਖ਼ͷϋΠύʔύϥϝʔλͰ͋ΓֶशύϥϝʔλͰͳ͍ɻ λ ͕ਖ਼Ͱ͋ΔͨΊɺਖ਼ଇ߲ΛՃ͢Δ͜ͱͰɺύϥϝʔλͷͷͷ ઈର͕େ͖͘ͳΔ͜ͱΛ͙͜ͱ͕Ͱ͖Δɻ(ৄ͘͠ PRML 1.1 ࢀর)
࠷ޙʹɺ͜ͷਖ਼ଇ߲͕֬Λ༻͍ͨ࣌ʹ MAP ਪఆ (࠷େࣄޙ֬ਪ ఆ) ͷ݁Ռͱͯ͠ɺਖ਼ଇ߲͕ొ͢Δ͜ͱΛݟΔɻ ͦͷͨΊʹɺࣄޙ֬ͱϕΠζਪఆΛܰ͘આ໌͢Δɻ(ৄ͘͠ PRML 1.2.3 ࢀর) 34 / 38
4. ଛࣦؔͱਖ਼ଇԽ ͜Ε·Ͱ (࠷ਪఆ) ͰɺؔΛ࠷େʹ͢ΔΑ͏ͳύϥϝʔλ w Λਪఆ͖ͯͨ͠ɻ ϕΠζਪఆͰɺڭࢣσʔλΛ༻͍ͯύϥϝʔλ w ͷ֬
(Ͱͳ ͘෯ΛͭɺࣄޙͱݺΕΔ) ΛٻΊΔɻ ͦͷࣄޙΛ༻͍ͯɺະͷσʔλͷೖྗ x ͕༩͑ΒΕͨ࣌ͷग़ྗ t ͷ༧ଌ p(t|x, t, X) ΛٻΊΔɻ(PRML 1.68 ࣜࢀর) ࣄޙͷʮࣄޙʯͱ܇࿅σʔλ͕؍ଌ͞Εͨঢ়ଶͰͷύϥϝʔλ w ͷ֬ͱ͍͏ҙຯͰ͋ΓɺҎԼͷ͖݅֬Ͱ͋Δɻ p(w|t, X) (4.19) 35 / 38
4. ଛࣦؔͱਖ਼ଇԽ ҰํͰɺ֬ͷ๏ఆཧ (PRML 1.11 ࣜ) Λ༻͍Δͱɺࣄޙ ؔ p(t|X, w)
ͱࣄલ p(w) ͷੵʹൺྫ͢Δɻ(ϕΠζͷఆཧ) p(w|t, X) ∝ p(t|X, w)p(w) (4.20) ճؼͷ࣌ͷؔ p(t|X, w, β) = N ∏ n=1 N(tn |y(xn , w), β−1) (4.21) Ͱ༩͍͑ͯͨͨΊɺࣄޙΛٻΊΔʹࣄલ p(w) ΛԾఆ͢Δඞ ཁ͕͋Δɻ 36 / 38
4. ଛࣦؔͱਖ਼ଇԽ ࠓճࣄલͱͯ͠ɺฏۉ͕ 0 Ͱڞࢄ͕ α−1I ͷΨεΛԾఆ ͢Δɻ p(w) =
N(w|0, α−1I) (4.22) ͜ΕΒͷ݁ՌΑΓɺࣄޙ p(w|t, X) ҎԼͷΑ͏ʹͳΔɻ p(w|t, X) ∝ p(t|X, w, β)p(w) ∝ exp ( − β 2 N ∑ n=1 (tn − y(xn , w))2 ) · exp ( − α 2 ∥w∥2 ) = exp ( − β 2 E(w; α/β) ) (4.23) ͜͜ͰɺE(w; λ) (4.18) Ͱఆٛͨ͠ਖ਼ଇԽ͞ΕͨޡࠩؔͰ͋Δɻ 37 / 38
4. ଛࣦؔͱਖ਼ଇԽ ͜ΕΑΓɺࣄޙΛ࠷େʹ͢Δύϥϝʔλ w ਖ਼ଇԽ͞Εͨޡࠩؔ E(w; λ) Λ࠷খʹ͢Δύϥϝʔλ w
Ͱ͋Δɻ 38 / 38