PRML(回帰編)

370e1dde1ef2391bdebe02e4a777890e?s=47 gucchi
August 16, 2019
210

 PRML(回帰編)

370e1dde1ef2391bdebe02e4a777890e?s=128

gucchi

August 16, 2019
Tweet

Transcript

  1. 2.

    ໨࣍ 1. ಋೖ 1-1. ؆୯ͳճؼͷྫ (PRML 1.1) 1-2. ֬཰࿦ͱ֬཰෼෍ 1-3.

    ࠷໬ਪఆͱϕΠζਪఆ 2. ઢܗճؼϞσϧ 2-1. ઢܗجఈؔ਺Ϟσϧ 2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ 2-3. ϕΠζઢܗճؼ 2 / 47
  2. 3.

    1. ಋೖ ػցֶशɺಛʹͦͷதͰ΋ڭࢣ͋ΓֶशͰ͸ɺ·ͣೖྗσʔλͷू߹ {x1 , x2 , · · ·

    , xN } ͱͦΕͧΕʹରԠ͢Δ໨ඪϕΫτϧͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙ͢Δɻ(܇࿅σʔλɺ·ͨ͸ڭࢣσʔλ) ༻ҙͨ͠܇࿅σʔλΛ༻͍ͯɺೖྗσʔλ͔Β໨ඪϕΫτϧΛ༧ଌ͢Δ ؔ਺ y(x) Λ࡞Δɻ(ֶश) ֶशऴྃޙɺະ஌ͷσʔλ x ͷ໨ඪϕΫτϧΛ y(x) Ͱ༧ଌ͢Δɻ 3 / 47
  3. 5.

    1-1. ؆୯ͳճؼͷྫ ·ͣ͸ɺ؆୯ͳճؼ໰୊Λ௚ײతʹઆ໌͢Δ (PRML 1.1 ʹରԠ)ɻ ܇࿅σʔλͱͯ͠ɺN ݸͷೖྗ x =

    (x1 , x2 , · · · , xN )T ͱͦΕͧΕʹର Ԡ͢Δ N ݸͷ໨ඪม਺ t = (t1 , t2 , · · · , tN )T Λ༻ҙ͢Δɻ(ճؼͳͷͰɺ ग़ྗ tn ͸࿈ଓతͳ஋ΛͱΔ) ࠓճɺ܇࿅σʔλͷग़ྗͰ͋Δ tn ͸ҎԼͷΑ͏ʹ sin(2πxn ) ʹΨ΢ε ෼෍ (ޙ΄Ͳઆ໌͢Δ) ʹै͏ϥϯμϜϊΠζ ϵ ΛՃ͑ͨ΋ͷͱͯ͠࡞ ੒͢Δɻ(PRML ෇࿥ A ࢀߟ) tn = sin(2πxn ) + ϵ (1.1) ճؼͷ໨త͸܇࿅σʔλ (x, t) Λ࢖ͬͯɺ(܇࿅σʔλʹؚ·Εͳ͍) ৽ ͨͳೖྗ ˆ x ͕༩͑ΒΕͨ࣌ͷग़ྗ ˆ t Λ༧ଌ͢Δ͜ͱͰ͋Δɻ 5 / 47
  4. 6.

    1-1. ؆୯ͳճؼͷྫ Լͷਤ͸܇࿅σʔλͷ਺ N = 10 ͷ৔߹ͷྫͰ͋Δɻ(੨ؙ͕܇࿅ σʔλ) ·ͨɺ྘ͷۂઢ͸ sin(2πx)

    Ͱ͋Δɻ ੨ؙ͕྘ͷۂઢ্ʹ৐͍ͬͯͳ͍ͷ͸ɺ(1.1) ͷΨ΢εͷϊΠζͷӨڹ Ͱ͋Δɻ 6 / 47
  5. 7.

    1-1. ؆୯ͳճؼͷྫ ͦΕͰ͸ɺ܇࿅σʔλΛ༻͍ͯະ஌ͷೖྗʹର͢Δग़ྗΛ༧ଌΛߦ͏ɻ ͱΓ͋͑ͣ͜ͷઅͰ͸ɺೖྗม਺ x ΛೖΕͨΒରԠ͢Δ໨ඪม਺ t ͷ༧ ଌ஋Λฦؔ͢਺ y(x)

    Λ܇࿅σʔλΛ࢖ͬͯ࡞੒͢Δ͜ͱΛߟ͑Δɻ ۩ମతʹࠓճ͸ҎԼͷΑ͏ͳଟ߲ࣜ y(x, w) Λߟ͑Δɻ y(x, w) = w0 + w1 x + w2 x2 + · · · + wM xM = M ∑ j=0 wj xj (1.2) ͜͜ͰϕΫτϧ w = (w0 , w1 , · · · , wM )T ͸ॏΈύϥϝʔλͱݺ͹ΕΔɻ զʑͷ໨ඪ͸܇࿅σʔλ (x, t) Λ࢖ͬͯɺy(x, w) ͕༧ଌؔ਺ͱͳΔΑ ͏ʹύϥϝʔλ w Λద੾ʹௐઅ͢Δ͜ͱͰ͋Δɻ 7 / 47
  6. 8.

    1-1. ؆୯ͳճؼͷྫ ͦΕͰ͸ɺ࣍ʹ w ΛͲͷΑ͏ʹௐઅ͢Ε͹ y(x, w) ͕༧ଌؔ਺ͱͯ͠ ૬Ԡ͘͠ͳΔͷ͔Λ௚ײతʹߟ͑Δɻ w

    ͸͢΂ͯͷ܇࿅σʔλʹରͯؔ͠਺ y(xn , w) ͕໨ඪม਺ tn ʹۙ͘ͳ ΔΑ͏ͳ w Ͱ͋Δͱྑ͍ͱࢥΘΕΔɻ ͦ͜ͰɺҎԼͷޡࠩؔ਺ E(w) Λ࠷খʹ͢ΔΑ͏ͳ w(= w⋆) ΛٻΊΔ ͜ͱΛߟ͑Δɻ E(w) = 1 2 N ∑ n=1 {y(xn , w) − tn }2 (1.3) y(x, w⋆) ͸༧ଌؔ਺ͱͯ͠૬Ԡ͍͠ͱߟ͑ΒΕΔɻ ͜ͷؔ਺ E(w) ͷ۩ମతͳ࠷খԽํ๏͸ޙʹ (εϥΠυͷ 2 ষ) Ͱٞ࿦ ͢Δɻ 8 / 47
  7. 9.

    1-1. ؆୯ͳճؼͷྫ ্ͷਤ͸ଟ߲ࣜͷ࣍ݩ M = 0, 1, 3, 9 ͷϑΟοςΟϯά݁ՌͰ͋Δɻ(྘

    ͕ sin(2πx) Ͱɺ੺͕ y(x, w⋆)) ͜ͷதͰ͸ɺM = 3 ͕Ұ൪ sin(2πx) ʹ౰ͯ͸·͍ͬͯΔΑ͏ʹݟ͑Δɻ M = 9 Ͱ͸ɺE(w⋆) = 0 ͕ͩɺsin(2πx) ʹ͸౰ͯ͸·͍ͬͯͳ͍ɻ(ա ֶश) 9 / 47
  8. 11.

    1-2. ֬཰࿦ͱ֬཰෼෍ ύλʔϯೝࣝʹ͓͍ͯɺॏཁͳෆ࣮֬ੑΛఆྔతʹධՁ͢ΔͨΊʹ֬཰ ࿦Λಋೖ͢Δɻ ֬཰ม਺ X, Y Λߟ͑ɺ͜ΕΒ͸ X =

    xi (i = 1, 2, · · · , M)ɺ Y = yj (j = 1, 2, · · · , L) ΛͱΔͱ͠ɺX = xi , Y = yj ͱͳΔ֬཰ (ಉ࣌ ֬཰) Λ p(X = xi , Y = yj ) ͱ͔͘ɻ X = xi ͱͳΔ֬཰ p(X = xi ) ͸ɺp(X = xi , Y = yj ) Λ༻͍ͯҎԼͷ Α͏ʹ͔͚Δɻ(Ճ๏ఆཧ) p(X = xi ) = L ∑ j=1 p(X = xi , Y = yj ) (1.4) ·ͨɺX = xi ͕༩͑ΒΕ্ͨͰɺY = yj ͱͳΔ֬཰ (৚݅෇͖֬཰) Λ p(Y = yj |X = xi ) ͱ͢ΔͱɺҎԼͷΑ͏ͳؔ܎͕ࣜ੒ཱ͢Δɻ(৐๏ ఆཧ) p(X = xi , Y = yj ) = p(Y = yj |X = xi )p(X = xi ) (1.5) 11 / 47
  9. 12.

    1-2. ֬཰࿦ͱ֬཰෼෍ ৐๏ఆཧͱಉ࣌֬཰ͷରশੑ p(X, Y ) = p(Y, X) Λ༻͍ΔͱɺϕΠζͷ

    ఆཧ͕ಋ͚Δɻ p(Y |X) = p(X|Y )p(Y ) p(X) (1.6) ͜͜Ͱɺp(Y ) Λࣄલ֬཰ (X ͕༩͑ΒΕΔલͷ֬཰) ͱ͍͍ɺp(Y |X) Λࣄޙ֬཰ (X ͕༩͑ΒΕͨޙͷ֬཰) ͱ͍͏ɻ ϕΠζͷఆཧ͸ࣄલ֬཰ p(Y ) ʹ໬౓ p(X|Y ) Λ͔͚Δͱɺࣄޙ֬཰ p(X|Y ) ʹͳΔͱ͍͏͜ͱΛද͢ (p(X) ͸ p(Y |X) ͕ Y ʹରͯ͠ن֨ Խ͞Ε͍ͯΔ͜ͱΛอূ͢Δن֨Խఆ਺)ɻ ͞Βʹɺಉ࣌෼෍ p(X, Y ) ͕ҎԼͷΑ͏ʹपล෼෍ͷੵͰදͤΔ࣌ɺX ͱ Y ͸ಠཱͰ͋Δͱ͍͏ɻ p(X, Y ) = p(X) p(Y ) (1.7) 12 / 47
  10. 13.

    1-2. ֬཰࿦ͱ֬཰෼෍ ͜Ε·Ͱ͸཭ࢄతͳ֬཰ม਺ʹ͍ͭͯߟ͖͑ͯͨɻ࣍ʹ࿈ଓతͳ֬཰ ม਺ͷ෼෍ʹ͍ͭͯߟ͑Δɻ ֬཰ม਺ x ͕ (x, x +

    δx) ͷൣғʹೖΔ֬཰͕ δx → 0 ͷ࣌ʹ p(x) δx ͱ ༩͑ΒΕΔ࣌ɺp(x) Λ֬཰ີ౓ͱ͍͏ɻ ͜ͷ࣌ɺม਺ x ͕۠ؒ (a, b) ʹ͋Δ֬཰͸ҎԼͷࣜͰ༩͑ΒΕΔɻ p(x ∈ (a, b)) = ∫ b a p(x) dx (1.8) ·ͨɺ֬཰ͷඇෛੑͱن֨ԽΑΓɺp(x) ͸ҎԼͷੑ࣭Λ࣋ͭɻ p(x) ≥ 0 (1.9) ∫ ∞ −∞ p(x) dx = 1 (1.10) 13 / 47
  11. 14.

    1-2. ֬཰࿦ͱ֬཰෼෍ ֬཰࿦Ͱͷॏཁͳܭࢉͱͯ͠ɺॏΈ෇͖ฏۉ͕͋Δɻ ࿈ଓతͳ֬཰ม਺ x ʹରͯ͠ɺؔ਺ f(x) ͷ֬཰෼෍ p(x) ͷԼͰͷฏۉ

    ஋͸ҎԼͷΑ͏ʹͳΔɻ E[f] = ∫ p(x)f(x) dx (1.11) ͜͜Ͱه๏ͱͯ͠ɺͲͷม਺ʹ͍ͭͯ࿨ (΋͘͠͸ੵ෼) Λͱ͍ͬͯΔ ͷ͔ΛఴࣈͰද͢͜ͱʹ͢Δɻྫ͑͹ɺҎԼͷྔ͸ x ͍ͭͯ࿨ (΋͘͠ ͸ੵ෼) Λͱͬͨ΋ͷͰ͋Δɻ Ex [f(x, y)] (1.12) 14 / 47
  12. 15.

    1-2. ֬཰࿦ͱ֬཰෼෍ ҎԼ͕ؔ਺ f(x) ͷ֬཰෼෍ p(x) ͷԼͰͷ෼ࢄͰ͋Δɻ(ؔ਺ f(x) ͕ͦ ͷฏۉ஋

    E[f(x)] ͷपΓͰͲΕ͚ͩόϥ͍͍ͭͯΔͷ͔Λද͢) var[f] = E [ (f(x) − E[f(x)])2 ] (1.13) ಛʹ f(x) = x ͷ࣌͸ҎԼ͕੒ཱ͢Δɻ var[x] = E[x2] − E[x]2 (1.14) ·ͨɺ2 ͭͷ֬཰ม਺ x ͱ y ͷؒͷڞ෼ࢄ (2 ͭͷ֬཰ม਺ͷґଘੑΛ ද͢) ͸ҎԼͷΑ͏ʹఆٛ͞ΕΔɻ cov[x, y] = Ex,y [ {x − E[x]}{y − E[y]} ] = Ex,y [xy] − E[x]E[y] (1.15) 2 ͭͷ֬཰ม਺ x ͱ y ͕ಠཱͷ࣌ɺcov[x, y] = 0 ͱͳΔɻ 15 / 47
  13. 17.

    1-2. ֬཰࿦ͱ֬཰෼෍ Ψ΢ε෼෍ͷॏཁͳੑ࣭ͱͯ͠ɺx ͷฏۉ஋Λ෼ࢄ͕ͦΕͧΕ µ ͱ σ2 Ͱ༩͑ΒΕΔ͜ͱͰ͋Δɻ E[x] =

    ∫ ∞ −∞ N(x|µ, σ2)x dx = µ (1.17) var[x] = E[x2] − E[x]2 = σ2 (1.18) 17 / 47
  14. 18.

    1-2. ֬཰࿦ͱ֬཰෼෍ ࣍ʹɺҎԼͷ D ࣍ݩͷϕΫτϧ x ʹର͢ΔଟมྔΨ΢ε෼෍Λಋೖ ͢Δɻ N(x|µ, Σ)

    = 1 (2π)D/2 1 |Σ|1/2 exp { − 1 2 (x − µ)TΣ−1(x − µ) } (1.19) ͜͜Ͱɺµ Λ D ࣍ݩͷฏۉϕΫτϧͱ͠ɺΣ Λ D × D ͷڞ෼ࢄߦྻͱ ͢Δɻ ͜ͷ৔߹Ͱ΋ฏۉͱڞ෼ࢄ͸ҎԼͷੑ࣭Λຬͨ͢ɻ E[x] = ∫ N(x|µ, Σ)x dx = µ (1.20) cov[x] = E[(x − E[x])(x − E[x])T] = Σ (1.21) 18 / 47
  15. 19.

    1-2. ֬཰࿦ͱ֬཰෼෍ Ҏ߱ͷٞ࿦ͰΑ͘࢖͏Ψ΢ε෼෍ͷެࣜΛ঺հ͢Δɻ ҎԼͷपล֬཰ p(x) ͱ৚݅෇͖֬཰ p(y|x) ͕༩͑ΒΕ͍ͯΔͱ͢Δɻ p(x) =

    N(x|µ, Λ−1) (1.22) p(y|x) = N(y|Ax + b, L−1) (1.23) ͜͜Ͱɺµ, A, b ͸ฏۉʹؔ͢ΔύϥϝʔλͰɺΛ, L ͸ਫ਼౓ߦྻͰ ͋Δɻ ͜ͷ࣌ɺपล֬཰ p(y) ͱ৚݅෇͖֬཰ p(x|y) ͸ҎԼͷΑ͏ʹͳΔɻ p(y) = N(y|Aµ + b, L−1 + AΛ−1AT) (1.24) p(x|y) = N(x|Σ{ATL(y − b) + Λµ}, Σ) (1.25) ͜͜ͰɺΣ ͸ҎԼͰఆٛ͞ΕΔɻ Σ = (Λ + ATLA)−1 (1.26) (ৄ͍͠ಋग़͸ PRML ͷ 2.3.3 Λࢀߟ) 19 / 47
  16. 20.

    1-2. ֬཰࿦ͱ֬཰෼෍ ·ͨɺಉ࣌෼෍ p(xa , xb ) ͕ҎԼͰ༩͑ΒΕ͍ͯͨͱ͢Δɻ p(xa ,

    xb ) = N(x|µ, Σ) (1.27) ͜͜Ͱɺx = (xa , xb )T Ͱ͋Δɻ ͜ͷͱ͖ɺपล෼෍ p(xa ) ͸ҎԼͷΑ͏ͳΨ΢ε෼෍ʹͳΔ͜ͱ͕஌Β Ε͍ͯΔɻ(ৄ͍͠ಋग़͸ PRML ͷ 2.3.2 Λࢀߟ) p(xa ) = ∫ p(xa , xb ) dxb = N(xa |µa , Σaa ) (1.28) ͜͜Ͱɺµa ͱ Σaa ͸ҎԼͷΑ͏ʹఆٛ͞ΕΔɻ µ = ( µa µb ) , Σ = ( Σaa Σab Σba Σbb ) (1.29) 20 / 47
  17. 21.

    1-3. ࠷໬ਪఆͱϕΠζਪఆ ϕΠζਪఆΛଟ߲ࣜۂઢϑΟοςΟϯάΛྫʹઆ໌͢Δɻ ϕΠζతͳ֬཰ղऍͰ͸ɺ·ͣσʔλΛ؍ଌ͢Δલʹɺզʑͷύϥϝʔ λ w ΁ͷԾઆΛࣄલ֬཰ p(w) ͷܗͰऔΓࠐΜͰ͓͘ɻ ࣮ࡍʹೖྗσʔλ

    x = (x1 , x2 , · · · , xN )T ͱ໨ඪม਺ t = (t1 , t2 , · · · , tN )T Λ༻͍ͯɺ໬౓ؔ਺ p(t|x, w) ΛٻΊΔɻ ϕΠζͷఆཧΑΓɺࣄޙ֬཰ p(w|t, x) ΛٻΊΔɻ p(w|t, x) = p(t|x, w)p(w) p(t) (1.30) 21 / 47
  18. 22.

    1-3. ࠷໬ਪఆͱϕΠζਪఆ ϕΠζਪఆͰ͸ɺ܇࿅σʔλ x, t ͱະ஌ͷೖྗσʔλ x ͕༩͑ΒΕͨ ࣌ͷ༧ଌ t

    ͷ֬཰ p(t|x, t, x) ͕ҎԼͷΑ͏ʹٻ·Δɻ p(t|x, t, x) = ∫ p(t|x, w)p(w|t, x) dw (1.31) (͜ͷ༧ଌ෼෍ͷಋग़ํ๏͸ҎԼͷ Qiita هࣄͰ·ͱΊͯ·͢ɻ͝ཡ͘ ͍ͩ͞ɻ͍͍ͦͯ͠Ͷ͍ͩ͘͞ɻ) https://qiita.com/gucchi0403/items/bfffd2586272a4c05a73 22 / 47
  19. 23.

    1-3. ࠷໬ਪఆͱϕΠζਪఆ ස౓ओٛతͳ֬཰ղऍͱϕΠζతͳ֬཰ղऍͰɺ໬౓ؔ਺ p(D|w) ͷ໾ ׂ͕มΘΔɻ ස౓ओٛతͳ֬཰ղऍͰ͸ɺw ͸͋Δݻఆ͞Εͨύϥϝʔλͱͯ͠ଊ ͑ɺ໬౓ؔ਺ p(D|w)

    Λ࠷େʹ͢ΔΑ͏ͳ w Λਪఆྔͱͯ͠ఆΊΔɻ (w ͸ 1 ͭʹఆ·Δ) ϕΠζతͳ֬཰ղऍͰ͸ɺ໬౓ؔ਺͸ࣄલ෼෍Λ؍ଌσʔλ D ʹΑͬ ͯɺࣄޙ෼෍ʹߋ৽͢ΔͨΊʹ࢖͏ (ࣄޙ෼෍ p(w|D) ͸ w ͷ֬཰෼෍ Ͱ͋Γɺw ͸ෆ࣮֬ੑΛ΋ͭ) ޙऀͷ໬౓ؔ਺ͷ࢖༻ํ๏ͷ۩ମྫ͸ޙ΄Ͳ঺հ͢Δɻ 23 / 47
  20. 24.

    2-1. ઢܗجఈؔ਺Ϟσϧ ͸͡Ίʹઆ໌ͨ͠؆୯ͳճؼϞσϧ͸ɺग़ྗ y(x, w) ΛҎԼͷΑ͏ʹೖ ྗม਺ x ͷଟ߲ࣜͱ͢Δ΋ͷͰ͋ͬͨɻ y(x,

    w) = w0 + w1 x + w2 x2 + · · · + wM xM = M ∑ j=0 wj xj (2.1) ͜͜Ͱɺw = (w0 , w1 , · · · , wM )T ͸ύϥϝʔλϕΫτϧͰ͋Δɻ ͜ͷষͰ͸ɺҰൠԽͱͯ͠ೖྗΛϕΫτϧ x ͱ͠ɺඇઢܗͳجఈؔ਺ ϕj (x) (j = 1, · · · , M − 1) Ͱؔ਺ y(x, w) ΛҎԼͷΑ͏ʹల։͢Δ͜ͱ Λߟ͑Δɻ y(x, w) = w0 + M−1 ∑ j=1 wj ϕj (x) (2.2) 24 / 47
  21. 25.

    2-1. ઢܗجఈؔ਺Ϟσϧ ·ͨࣜΛ୹ॖ͢ΔͨΊɺϕ0 (x) = 1 ͱ͠ɺ ϕ(x) = (ϕ0

    (x), ϕ1 (x), · · · , ϕM−1 (x))T ͱఆٛ͢Δͱɺ(2.2) ͸ y(x, w) = M−1 ∑ j=0 wj ϕj (x) = wTϕ(x) (2.3) ͱॻ͚Δɻ ྫ͑͹ɺجఈؔ਺ ϕj (x) ͱͯ͠ҎԼͷΨ΢εجఈؔ਺͕͋Δɻ ϕj (x) = exp { − (x − µj )2 2s2 } (2.4) ͜ͷجఈؔ਺͸ x = µj Λத৺ʹͯ͠ɺ෼ࢄ s2 ʹΑͬͯࢧ഑͞ΕΔ޿͕ ΓΛ࣋ͭΨ΢εجఈؔ਺Ͱ͋Δɻ Ҏ߱͸Ұൠͷجఈؔ਺ ϕj (x) Λ༻͍ͯٞ࿦͢Δɻ 25 / 47
  22. 26.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ॳΊͷষͰઆ໌ͨ͠ճؼ໰୊Ͱ͸ɺೋ৐࿨ޡࠩΛ࠷খʹ͢ΔΑ͏ʹσʔ λ఺Λଟ߲ࣜؔ਺ʹϑΟοςΟϯάͤͨ͞ɻ ࠓճ͸ɺ໨ඪม਺ t ͕ҎԼͷΑ͏ʹܾఆ࿦తͳؔ਺ y(x, w) ͱظ଴஋͕

    0 Ͱਫ਼౓͕ β > 0 ͷΨ΢ε෼෍ N(ϵ|0, β−1) ʹै͏ ϵ ͷ࿨Ͱॻ͚Δͱ ͢Δɻ t = y(x, w) + ϵ (2.5) ϵ = t − y(x, w) ΑΓɺҎԼͷΑ͏ʹ໨ඪม਺ t ΋Ψ΢ε෼෍ʹै͏ɻ p(t|x, w, β) = N(t − y(x, w)|0, β−1) = N(t|y(x, w), β−1) (2.6) 26 / 47
  23. 27.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ͜͜Ͱɺೖྗσʔλͷू߹ X = {x1 , x2 , ·

    · · , xN } ͱͦΕͧΕʹରԠ͢ Δ໨ඪม਺ͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙ͠ɺ໨ඪม਺Λॎʹฒ΂ͨϕ Ϋτϧ t = (t1 , t2 , · · · , tN )T Λఆٛ͢Δɻ ؍ଌ఺ {t1 , t2 , · · · , tN } ͕෼෍ (2.6) ͔Βಠཱʹੜ੒͞Εͨͱ͢Δͱɺ໬ ౓ؔ਺͸ҎԼͷΑ͏ʹݸʑͷσʔλ఺ͷ෼෍ͷੵͰॻ͚Δɻ p(t|X, w, β) = N ∏ n=1 N(tn |y(xn , w), β−1) (2.7) ͜͜ͰɺΨ΢ε෼෍ N(x|µ, σ2) ͸ N(x|µ, σ2) = 1 (2πσ2)1/2 exp { − 1 2σ2 (x − µ)2 } (2.8) Ͱ͋Δɻ 27 / 47
  24. 28.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ໬౓ؔ਺ͷ (2.7) Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔ୅ΘΓʹ໬౓ ؔ਺ͷର਺Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔɻ ·ͣɺ ln { N(tn

    |y(xn , w), β−1) } = ln [ β1/2 (2π)1/2 exp { − β 2 (tn − y(xn , w))2 }] = 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 (2.9) ΑΓɺln p(t|X, w, β) ͸ҎԼͷΑ͏ʹͳΔɻ ln p(t|X, w, β) = N ∑ n=1 ln N(tn |y(xn , w), β−1) = N ∑ n=1 [ 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 ] = N 2 ln β − N 2 ln (2π) − β 2 N ∑ n=1 (tn − y(xn , w))2 (2.10) 28 / 47
  25. 29.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ͜͜Ͱɺೋ৐࿨ޡࠩ ED (w) Λ ED (w) = 1

    2 N ∑ n=1 (tn − y(xn , w))2 (2.11) ͱఆٛ͢Δͱɺln p(t|X, w, β) ͸ ln p(t|X, w, β) = N 2 ln β − N 2 ln (2π) − βED (w) (2.12) ͱͳΔɻ ࠷໬ਪఆղ wML , βML ΛٻΊΔͨΊʹର਺໬౓ ln p(t|X, w, β) ͷޯ഑ ΛٻΊΔɻ ର਺໬౓ͷ w ʹର͢Δޯ഑͸ β ʹґଘ͠ͳ͍ͷͰɺઌʹ wML ΛٻΊ ͯɺͦͷ͋ͱʹ ln p(t|X, wML , β) Λ༻͍ͯ βML ΛٻΊΔ͜ͱ͕Ͱ ͖Δɻ 29 / 47
  26. 30.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ·ͣɺର਺໬౓ (2.12) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱΛߟ͑Δͱɺ (2.12) ͷӈลͷ 1,

    2 ߲໨͸ w ʹґଘ͠ͳ͍ͷͰɺ3 ߲໨ͷ −βED (w) Λ࠷େԽ͢Δ͜ͱͱ౳ՁͰ͋Δɻ β > 0 ΑΓɺର਺໬౓ (2.12) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱ͸ೋ৐࿨ޡ ࠩ ED (w)(2.11) Λ w ʹؔͯ͠࠷খʹ͢Δ͜ͱͱ౳ՁͰ͋Δɻ 1-1 Ͱൃݟ๏తʹೋ৐࿨ޡࠩ (1.3) Λ࠷খԽ͕ͨ͠ɺೋ৐࿨ޡࠩ (1.3) ͷ ࠷খԽ͸֬཰࿦Λ༻͍Δͱ໬౓ؔ਺ΛΨ΢ε෼෍ͱԾఆͨ͠ͱ͖ͷ࠷ ໬ਪఆͷ݁ՌͰ͋Δࣄ͕Θ͔Δɻ ͦΕͰ͸࣮ࡍʹର਺໬౓ͷ w ʹର͢Δޯ഑ΛٻΊΔɻ 30 / 47
  27. 31.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ର਺໬౓ͷ w ʹର͢Δޯ഑͸ y(x, w) = wTϕ(x) ΑΓɺ

    ∂ ∂w ln p(t|X, w, β) = − β ∂ ∂w ED (w) = − β 2 N ∑ n=1 ∂ ∂w (tn − wTϕ(xn ))2 =β N ∑ n=1 (tn − wTϕ(xn ))ϕ(xn ) =β { N ∑ n=1 tn ϕ(xn ) − N ∑ n=1 ϕ(xn )ϕ(xn )Tw } (2.13) ͱͳΓɺ࠷໬ਪఆղ wML ͸ҎԼͷࣜΛຬͨ͢ɻ N ∑ n=1 tn ϕ(xn ) − N ∑ n=1 ϕ(xn )ϕ(xn )TwML = 0 (2.14) 31 / 47
  28. 32.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ͜͜ͰɺҎԼͷܭըߦྻ Φ Λఆٛ͢Δɻ Φ =   

       ϕ0 (x1 ) ϕ1 (x1 ) · · · ϕM−1 (x1 ) ϕ0 (x2 ) ϕ1 (x2 ) · · · ϕM−1 (x2 ) . . . . . . ... . . . ϕ0 (xN ) ϕ1 (xN ) · · · ϕM−1 (xN )       =       ϕ(x1 )T ϕ(x2 )T . . . ϕ(xN )T       (2.15) ҎԼͷ͕ࣜ੒Γཱͭࣄ͕Θ͔Δɻ ΦTΦ = N ∑ n=1 ϕ(xn )ϕ(xn )T (2.16) ΦTt = N ∑ n=1 tn ϕ(xn ) (2.17) ͜ΕΑΓɺ(2.14) ͸ҎԼͷΑ͏ʹͳΔɻ ΦTt − ΦTΦwML = 0 (2.18) 32 / 47
  29. 33.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ Αͬͯɺ࠷໬ਪఆղ wML ͸ wML = (ΦTΦ)−1ΦTt (2.19) ͱͳΔɻ

    ࣍ʹɺ࠷໬ਪఆղ wML Λ୅ೖͨ͠ ln p(t|X, wML , β) ͷ β ͷඍ෼Λߟ ͑Δͱ ∂ ∂β ln p(t|X, wML , β) = N 2 1 β − ED (wML ) (2.20) ͱͳΔɻ ͜ΕΑΓɺ࠷໬ਪఆղ βML ͷٯ਺͸ҎԼͷΑ͏ʹͳΔɻ 1 βML = 2 N ED (wML ) = 1 N N ∑ n=1 (tn − wT ML ϕ(xn ))2 (2.21) 33 / 47
  30. 34.

    2-2. ઢܗجఈؔ਺Ϟσϧͷ࠷໬ਪఆ ͜ΕΑΓɺ৽ͨͳೖྗϕΫτϧ x ͕༩͑ΒΕͨ࣌ͷ໨ඪม਺ t ͷ༧ଌ෼ ෍ p(t|x, wML

    , βML ) ͸ҎԼͷΑ͏ʹͳΔɻ p(t|x, wML , βML ) = N(t|y(x, wML ), β−1 ML ) (2.22) ͜͜ͰɺwML , βML ͸ (2.19) ͱ (2.21) Ͱ༩͑ΒΕΔɻ 34 / 47
  31. 35.

    2-3. ϕΠζઢܗճؼ ࣍͸ઢܗճؼϞσϧΛϕΠζతʹѻ͏͜ͱΛߟ͑Δɻ ͦ͜Ͱɺฏۉ͕ m0 Ͱڞ෼ࢄ͕ S0 ͷҎԼͷࣄલ෼෍ΛԾఆ͢Δɻ p(w) =

    N(w|m0 , S0 ) (2.23) ·ͨɺ໬౓ؔ਺͸ p(t|X, w, β) = N ∏ n=1 N(tn |y(xn , w), β−1) (2.24) Ͱ͋ΔͷͰɺࣄޙ෼෍ p(w|t) ͸ϕΠζͷఆཧʹΑΓɺҎԼͷΑ͏ʹ ͳΔɻ p(w|t) ∝ p(t|X, w, β)p(w) ∝ exp ( − β 2 N ∑ n=1 (tn − wTϕ(xn ))2 ) × exp ( − 1 2 (w − m0 )TS−1 0 (w − m0 ) ) (2.25) 35 / 47
  32. 36.

    2-3. ϕΠζઢܗճؼ (2.25) ΑΓɺࢦ਺ͷݞ͕ w ͷ 2 ࣍Ͱ͋ΔͷͰ p(w|t) ͸Ψ΢ε෼෍Ͱ

    ͋Δɻ ۩ମతʹ͸ɺp(w|t) ͸ҎԼͷΑ͏ʹͳΔɻ(PRML ͷԋश 3.7 ࢀর) p(w|t) = N(w|mN , SN ) (2.26) ͜͜ͰɺmN ͱ SN ͸ҎԼͰ͋Δɻ mN =SN (S−1 0 m0 + βΦTt) (2.27) S−1 N =S−1 0 + βΦTΦ (2.28) 36 / 47
  33. 37.

    2-3. ϕΠζઢܗճؼ ͜͜Ͱɺ࠷໬ਪఆղ wML (2.19) ͱࣄޙ෼෍ p(w|t) ͷϞʔυ wMAP (Ϟʔυͱ͸ɺp(w|t)

    Λ࠷େʹ͢Δ w) ͱࣄޙ෼෍ͷฏۉ஋ mN ͷؔ܎ Λߟ࡯͢Δɻ ·ͣɺΨ΢ε෼෍ͷϞʔυ͸ฏۉ஋ʹ౳͍͠ͱ͍͏ੑ࣭ (PRML ͷԋश 1.9 ࢀর) ͕͋ΔͷͰɺwMAP = mN Ͱ͋Δ͜ͱ͕Θ͔Δɻ ͞Βʹɺແݶʹ޿͍ࣄલ෼෍ S0 = α−1I(α → 0) Λߟ͑Δͱ S−1 N = S−1 0 + βΦTΦ → βΦTΦ (2.29) ͱͳΓɺ mN = SN (S−1 0 m0 + βΦTt) → (ΦTΦ)−1ΦTt (2.30) ͱͳΔͷͰɺ͜ͷͱ͖ wMAP = mN = wML Ͱ͋Δ͜ͱ͕Θ͔Δɻ ͭ·ΓɺԿ΋৘ใΛ࣋ͨͳ͍ (ແݶʹ޿͍) ࣄલ෼෍Λ࢖༻ͨ͠ͱ͖ͷ ࣄޙ෼෍Λ࠷େʹ͢Δύϥϝʔλ͸໬౓ؔ਺Λ࠷େʹ͢Δύϥϝʔλ ͱҰக͢Δͱ͍͏͜ͱͰ͋Δɻ 37 / 47
  34. 38.

    2-3. ϕΠζઢܗճؼ લͷষͰɺϕΠζతͳѻ͍Ͱ͸ɺ໬౓ؔ਺͸ࣄޙ෼෍Λߋ৽͢Δ΋ͷͰ ͋Δͱઆ໌͕ͨ͠ɺͦͷߋ৽ͷ༷ࢠΛྫΛ࢖ͬͯݟ͍ͯ͘ɻ ·ͣɺઃఆͱͯ͠໬౓ؔ਺ͷฏۉ஋͸ y(x, w) = w0 +

    w1 x ͱ͢Δɻ ·ͨɺڭࢣσʔλʹ͍ͭͯ͸ɺೖྗσʔλ xn ͸ −1 ͔Β 1 ͷҰ༷෼෍ ͔ΒબͼɺରԠ͢Δ໨ඪ஋ tn ͸ɺඪ४ภࠩ 0.2 Ͱฏۉ 0 ͷΨ΢εϊΠ ζ ϵ Λ༻͍ͯ tn = f(xn , a0 = −0.3, a1 = 0.5) + ϵ (2.31) ͜͜Ͱɺ f(x, a0 , a1 ) = a0 + a1 x (2.32) Ͱ͋Δɻ ͭ·Γɺ͜͜Ͱͷ໨ඪ͸ڭࢣσʔλΛ༻͍ͯύϥϝʔλ w0 , w1 ͕ a0 = −0.3, a1 = 0.5 Λ෮ݩ͢Δ͜ͱͰ͋Δɻ 38 / 47
  35. 39.

    2-3. ϕΠζઢܗճؼ ·ͨɺ໬౓ؔ਺ͷਫ਼౓͸ط஌Ͱ β = (1/0.2)2 = 25 ͱ͠ɺࣄલ෼෍͸Ҏ ԼͷΑ͏ͳ౳ํతΨ΢ε෼෍Λ༻͍ͯɺύϥϝʔλ

    α ͷ஋͸ α = 2.0 ͱ ͢Δɻ p(w) = N(w|0, α−1I) (2.33) ͜ͷઃఆͰڭࢣσʔλ͕૿͍͑ͯ͘ͱ͖ͷࣄޙ෼෍ͷߋ৽ʹ͍ͭͯݟ ͍ͯ͘ɻ 39 / 47
  36. 40.

    2-3. ϕΠζઢܗճؼ ·ͣ͸ڭࢣσʔλ͕؍ଌ͞ΕΔલͷஈ֊ͷάϥϑͰ͋Δɻ ࠨͷάϥϑ͸ࣄલ෼෍ p(w) Ͱ͋Γɺӈͷάϥϑ͸ͦͷࣄલ෼෍͔Βϥ ϯμϜʹબͼग़͞Εͨύϥϝʔλ w0 , w1

    Λ༻͍ͯɺؔ਺ y(x, w) = w0 + w1 x Λඳ͘͜ͱΛ 6 ճߦ͍ͬͯΔɻ ౰વڭࢣσʔλ͕ͳ͍ͷͰɺ6 ݸͷؔ਺͸·ͱ·Γ͕ͳ͘ɺ f(x, a0 = −0.3, a1 = 0.5) = −0.3 + 0.5x ʹ͍ۙؔ਺͸ͳ͍ɻ 40 / 47
  37. 41.

    2-3. ϕΠζઢܗճؼ ࣍͸ڭࢣσʔλ͕Ұͭ؍ଌ͞Εͨ࣌ͷάϥϑͰ͋Δɻ ࠨͷάϥϑ͸͜ͷσʔλ఺ͷ໬౓ؔ਺ p(t|x, w) Λ w ͷؔ਺ͱͯ͠ϓ ϩοτͨ͠΋ͷͰ͋Δɻന͍ेࣈ͕

    a0 = −0.3, a1 = 0.5 ͷ఺Ͱ͋Δɻ ਅΜதͷάϥϑ͸ࣄޙ෼෍ɺͭ·Γࣄલ෼෍ p(w) ʹ໬౓ؔ਺ p(t|x, w) Λ͔͚ͯن֨Խͨ͠΋ͷͰ͋Γɺӈͷάϥϑ͸ͦͷࣄޙ෼෍͔Βϥϯμ Ϝʹબͼग़͞Εͨύϥϝʔλ w0 , w1 Λ༻͍ͯɺؔ਺ y(x, w) = w0 + w1 x Λඳ͘͜ͱΛ 6 ճߦ͍ͬͯΔɻ ·ͩڭࢣσʔλ͕গͳ͍ͷͰɺ6 ݸͷؔ਺͸·ͱ·Γ͕ͳ͘ɺ f(x, a0 = −0.3, a1 = 0.5) = −0.3 + 0.5x ʹ͍ۙؔ਺͸ͳ͍͕ɺ͢΂ͯ ͷઢ͕σʔλ఺ (੨ؙ) ͷۙ͘Λ௨͍ͬͯΔ͜ͱʹ஫ҙɻ 41 / 47
  38. 42.

    2-3. ϕΠζઢܗճؼ ࣍͸ೋͭ໨ͷڭࢣσʔλ͕؍ଌ͞Εͨ࣌ͷάϥϑͰ͋Δɻ ࠨͷάϥϑ͸ಉ͘͡ɺ͜ͷσʔλ఺ͷ໬౓ؔ਺ p(t|x, w) Ͱ͋Δɻ ਅΜதͷάϥϑ͸σʔλ఺͕Ұݸͩͬͨ࣌ͷࣄޙ෼෍Λࣄલ෼෍ͱ͠ ͯɺͦΕʹ໬౓ؔ਺Λ͔͚ͨ΋ͷͰ͋Γɺӈͷάϥϑ͸ͦͷࣄޙ෼෍͔ ΒϥϯμϜʹબͼग़͞Εͨύϥϝʔλ

    w0 , w1 Λ༻͍ͯɺؔ਺ y(x, w) = w0 + w1 x Λඳ͘͜ͱΛ 6 ճߦ͍ͬͯΔɻ ࣄޙ෼෍͕ a0 = −0.3, a1 = 0.5 ෇ۙʹ࠷େΛ࣋ͪɺ6 ݸͷؔ਺͕ f(x, a0 = −0.3, a1 = 0.5) = −0.3 + 0.5x ෇ۙʹ·ͱ·Γ࢝Ίɺ͢΂ͯ ͷઢ͕ 2 ͭͷσʔλ఺ (੨ؙ) ͷۙ͘Λ௨͍ͬͯΔɻ 42 / 47
  39. 43.

    2-3. ϕΠζઢܗճؼ ࠷ޙʹ 20 ݸͷσʔλ͕؍ଌ͞Εͨ࣌ͷάϥϑͰ͋Δɻ ࠨͷάϥϑ͸ɺ20 ݸ໨ͷσʔλ఺ͷ໬౓ؔ਺ p(t|x, w) Ͱ͋Δɻ

    ਅΜதͷάϥϑ͸ 20 ݸ໨ͷσʔλ͢΂ͯΛؚΜͩࣄޙ෼෍Ͱ͋Γɺӈ ͷάϥϑ͸ͦͷࣄޙ෼෍͔ΒϥϯμϜʹબͼग़͞Εͨύϥϝʔλ w0 , w1 Λ༻͍ͯɺؔ਺ y(x, w) = w0 + w1 x Λඳ͘͜ͱΛ 6 ճߦͬͯ ͍Δɻ ࣄޙ෼෍͕ a0 = −0.3, a1 = 0.5 ෇ۙʹӶ͍෼෍Λ࣋ͪɺ6 ݸͷؔ਺͕ f(x, a0 = −0.3, a1 = 0.5) = −0.3 + 0.5x ෇ۙʹ·ͱ·͍͍ͬͯͯΔ͜ ͱ͕Θ͔Δɻ 43 / 47
  40. 44.

    2-3. ϕΠζઢܗճؼ ࣍ʹࣄޙ෼෍ p(w|t, α, β) ͱ໬౓ؔ਺ p(t|x, w, β)

    Λ༻͍ͯɺະ஌ͷೖ ྗϕΫτϧ x ʹର͢Δ༧ଌ t ͷ֬཰෼෍ΛٻΊΔɻ(ࣄޙ෼෍ʹϋΠ ύʔύϥϝʔλ α, β ΛҾ਺ʹ෮׆ͤͨ͞ɻ) (1.31) ΑΓɺ༧ଌ෼෍ p(t|t, α, β) ͸ҎԼͷΑ͏ʹͳΔɻ p(t|t, α, β) = ∫ p(t|x, w, β)p(w|t, α, β) dw (2.34) (2.6) ͱ (2.26) ͱ (1.24) Λ༻͍Δͱɺp(t|t, α, β) ͸ҎԼͷΑ͏ʹͳΔɻ (PRML ͷԋश 3.10 ࢀর) p(t|t, α, β) = N(t|mT N ϕ(x), σ2 N (x)) (2.35) ͜͜Ͱɺ༧ଌ෼෍ͷ෼ࢄ σ2 N (x) ͸ҎԼͰ༩͑ΒΕΔɻ σ2 N (x) = 1 β + ϕ(x)TSN ϕ(x) (2.36) 44 / 47
  41. 45.

    2-3. ϕΠζઢܗճؼ σ2 N (x) ͷҰ߲໨ͷ 1/β ͸໬౓ؔ਺ͷ෼ࢄͰ͋Γɺೖྗσʔλʹର͢Δ ग़ྗͷόϥ͖ͭ (ϊΠζ)

    Ͱ͋Δɻ Ұํɺೋ߲໨ͷ ϕ(x)TSN ϕ(x) ͸ w ͷෆ࣮֬ੑ (ࣄޙ෼෍ͷ෼ࢄ) ͔Β ͘Δ߲Ͱ͋Δɻ(ύϥϝʔλΛ఺ਪఆ͠ͳ͍ϕΠζਪఆಛ༗ͷϊΠζ) ͜ͷೋ߲໨͸৽ͨͳڭࢣσʔλ͕௥Ճ͞ΕΔ (N → N + 1) ͱখ͘͞ͳ Δɺͭ·Γ σ2 N+1 (x) ≤ σ2 N (x) ͱͳΔɻ(PRML ͷԋश 3.11 ࢀর) ͜Ε͸ڭࢣσʔλ͕૿͑Δͱɺग़ྗͷ༧ଌͷ࣮͕֬͞૿͑Δͱ͍͏͜ͱ Λද͢ɻ ࠷ޙʹྫΛ༻͍ͯɺڭࢣσʔλ͕૿͑Δͱ༧ଌͷෆ͔͕֬͞ݮΔ༷ࢠΛ ݟΔɻ 45 / 47
  42. 46.

    2-3. ϕΠζઢܗճؼ ྫ͸؆୯ͳճؼͷͱ͖ʹ࢖༻ͨ͠ࡾ֯ؔ਺ͷྫͰ͋Δɻ ܇࿅σʔλͱͯ͠ɺN ݸͷೖྗ x = (x1 , x2

    , · · · , xN )T ͱͦΕͧΕʹର Ԡ͢Δ N ݸͷ໨ඪม਺ t = (t1 , t2 , · · · , tN )T Λ༻ҙ͢Δɻ tn ͸ҎԼͷΑ͏ʹ sin(2πxn ) ʹΨ΢ε෼෍ʹै͏ϥϯμϜϊΠζ ϵ Λ Ճ͑ͨ΋ͷͱ͢Δɻ tn = sin(2πxn ) + ϵ (2.37) ໬౓ؔ਺ͷฏۉ஋Ͱ͋Δ y(x, w) ͸Ψ΢εجఈؔ਺ (2.4) Ͱల։͢Δɻ ͜ͷઃఆͰڭࢣσʔλͷ਺͕ N = 1, 2, 4, 25 ͷͱ͖ͷάϥϑ͸ҎԼͷΑ ͏ʹͳΔɻ 46 / 47
  43. 47.

    2-3. ϕΠζઢܗճؼ ੨ؙ͕ڭࢣσʔλɺԫ྘ͷઢ͕ਖ਼ղͰ͋ΔαΠϯؔ਺ɺ੺͍ઢ͕༧ଌ෼ ෍ͷฏۉ mT N ϕ(x)ɺബ͍੺ͷྖҬ͕༧ଌ ±σN (x) ͷྖҬͰ͋Δɻ

    ڭࢣσʔλ͕૿͑Ε͹૿͑Δ΄Ͳɺ੺͍ઢ͕ԫ྘ͷઢʹۙ෇͖ɺബ͍੺ ͷྖҬ͕ݮ͍༷ͬͯ͘ࢠ͕ݟͯऔΕΔɻ 47 / 47