PRML第1章

370e1dde1ef2391bdebe02e4a777890e?s=47 gucchi
October 29, 2018

 PRML第1章

370e1dde1ef2391bdebe02e4a777890e?s=128

gucchi

October 29, 2018
Tweet

Transcript

  1. PRML 1 ষ 2018/10/29 ࡔޱ ྒี 1 / 55

  2. ୈ 1 ষ ং࿦ ▶ ػցֶशͰ͸ɺಛʹͦͷதͰ΋ڭࢣ͋ΓֶशͰ͸ɺ·ͣೖྗσʔλ ͷू߹ {x1 , x2

    , · · · , xN } ͱͦΕͧΕʹରԠ͢Δ໨ඪϕΫτϧͷू ߹ {t1 , t2 , · · · , tN } Λ༻ҙ͢Δ (܇࿅σʔλ) ▶ ܇࿅σʔλΛ༻͍ͯɺೖྗσʔλ͔Β໨ඪϕΫτϧΛ༧ଌ͢Δؔ ਺ y(x) Λ࡞Δ (ֶश) ▶ ֶशऴྃޙɺະ஌ͷσʔλ x ͷ໨ඪϕΫτϧΛ y(x) Ͱ༧ଌ͢Δ ▶ ֤ೖྗϕΫτϧΛ༗ݶݸͷ཭ࢄతͳΧςΰϦʹׂΓ౰ͯΔ৔߹ (ྫ ͑͹ɺखॻ͖਺ࣈͷೝࣝ) ΛΫϥε෼ྨͱ͍͍ɺग़ྗ͕࿈ଓม਺ͷ ৔߹Λճؼͱ͍͏ ▶ ·ͣ͸ճؼͷ؆୯ͳྫʹ͍ͭͯߟ͑Δ 2 / 55
  3. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ܇࿅σʔλͱͯ͠ɺN ݸͷೖྗ x = (x1 , x2

    , · · · , xN )T ͱͦΕͧΕ ʹରԠ͢Δ N ݸͷ໨ඪม਺ t = (t1 , t2 , · · · , tN )T Λ༻ҙ͢Δ (ճؼ ͳͷͰɺग़ྗ tn ͸࿈ଓతͳ஋ΛͱΔ) ▶ tn ͸ sin(2πxn ) ʹΨ΢ε෼෍ʹै͏ϥϯμϜϊΠζΛՃ͑ͨ΋ͷ ▶ ܇࿅σʔλ (x, t) Λ࢖ͬͯɺ৽ͨͳೖྗ ˆ x ͕༩͑ΒΕͨ࣌ͷग़ྗ ˆ t Λ༧૝͍ͨ͠ ▶ Լͷਤ͸ N = 10 ͷ৔߹ͷྫ 3 / 55
  4. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ܇࿅σʔλ͕༗ݶݸ (N ݸ) Ͱ͋Δ͕Ώ͑ɺˆ t ʹ͸ෆ࣮֬ੑ͕͋Γɺ ͦͷෆ࣮֬ੑͷఆྔతͳදݱΛ༩͑Δ࿮૊Έ͸

    1.2 અͰಋೖ͢Δ ▶ ͱΓ͋͑ͣ͜ͷઅͰ͸ɺҎԼͷΑ͏ͳଟ߲ࣜΛ࢖ͬͯϑΟοςΟ ϯάΛߦ͍ɺ༧ଌΛߦ͏͜ͱΛߟ͑Δ y(x, w) = w0 + w1 x + w2 x2 + · · · + wM xM = M ∑ j=0 wj xj (1.1) ▶ ܇࿅σʔλ (x, t) Λ࢖ͬͯɺଟ߲ࣜͷύϥϝʔλ w = (w0 , w1 , · · · , wM )T Λ͍͍ײ͡ʹνϡʔχϯά͍ͨ͠ ▶ ͦ͜ͰɺҎԼͷޡࠩؔ਺ E(w) Λ࠷খʹ͢ΔΑ͏ͳ w(= w⋆) Λٻ ΊΔ͜ͱΛߟ͑Δ E(w) = 1 2 N ∑ n=1 {y(xn , w) − tn }2 (1.2) 4 / 55
  5. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ্ͷਤ͸ଟ߲ࣜͷ࣍ݩ M = 0, 1, 3, 9

    ͷϑΟοςΟϯά݁Ռ (྘͕ sin(2πx) Ͱɺ੺͕ y(x, w⋆)) ▶ ͜ͷதͰ͸ɺM = 3 ͕Ұ൪ sin(2πx) ʹ౰ͯ͸·͍ͬͯΔΑ͏ʹݟ ͑Δ ▶ M = 9 Ͱ͸ɺE(w⋆) = 0 ͕ͩɺsin(2πx) ʹ͸౰ͯ͸·͍ͬͯͳ͍ (աֶश) 5 / 55
  6. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ͦΕͧΕͷ M ʹରͯ͠ɺy(x, w⋆) ͕ͲΕ͚ͩະ஌ͷσʔλΛਖ਼͠ ͘༧ଌͰ͖͍ͯΔͷ͔ΛਤΛݟͯͳΜͱͳ͘ͷධՁͰ͸ͳ͘ɺఆ ྔతʹධՁ͍ͨ͠

    ▶ ͦ͜ͰɺҎԼͷฏۉೋ৐ฏํࠜޡࠩΛಋೖ͢Δ ERMS = √ 2E(w⋆)/N (1.3) ▶ ςετσʔλͷ ERMS ͕খ͚͞Ε͹খ͍͞΄Ͳɺະ஌ͷσʔλΛ ਖ਼͘͠༧ଌͰ͖͍ͯΔͱݴ͑Δ 6 / 55
  7. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ্ͷਤ͸͍ΖΜͳ M ʹର͢Δ܇࿅σʔλͱςετσʔλͷ ERMS ▶ ܇࿅σʔλʹର͢Δ ERMS

    ͕੨ɺςετσʔλ (ະ஌ͷσʔλ) ʹ ର͢Δ ERMS ͕੺ ▶ M = 9 Ͱ͸܇࿅σʔλʹ͸Α͘ϑΟοτ͍ͯ͠Δ͕ɺςετσʔ λʹ͸શ͘ϑΟοτ͍ͯ͠ͳ͍ 7 / 55
  8. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ͞ΒʹɺͦΕͧΕͷ M Ͱͷ w⋆ ͷ஋্͕ͷදͰ͋Δ ▶ M

    = 9 Ͱ͸ɺϥϯμϜϊΠζʹ΋ϑΟοτ͢ΔΑ͏ʹɺେ͖ͳਖ਼ ෛͷ਺Ͱௐ੔Λ͍ͯ͠Δ༷ࢠ͕Θ͔Δ 8 / 55
  9. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ্ͷਤ͸ɺM = 9 Ͱͷ͍ΖΜͳ܇࿅σʔλͷ਺ͰͷϑΟοςΟϯ ά݁Ռ ▶ M

    = 9 Ͱ͋ͬͯ΋ɺ܇࿅σʔλͷ਺͕े෼ଟ͚Ε͹ɺաֶशΛ๷ ͛Δ͜ͱ͕Θ͔Δ 9 / 55
  10. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ࣍ʹɺෳࡶͳϞσϧ (M = 9 ͱ͔) ΛݶΒΕͨ܇࿅σʔλ਺ (N

    = 10 ͱ͔) Λ༻͍ͯɺաֶश͕ى͖ͳ͍Α͏ʹ͢ΔͨΊʹਖ਼ଇ ԽΛߦ͏ ▶ ද 1.1 Ͱɺաֶश͕ى͖͍ͯΔ M = 9 Ͱ͸ɺύϥϝʔλ w⋆ ͷ੒ ෼͕େ͖ͳਖ਼ෛͷ਺ʹͳ͍ͬͯͨͷͰɺҎԼͷΑ͏ͳޡࠩؔ਺Λ ߟ͑Δ E(w) = 1 2 N ∑ n=1 {y(xn , w) − tn }2 + λ 2 ∥w∥2 (1.4) ▶ ͜͜ͰɺϊϧϜ ∥w∥2 = wT w = w2 0 + w2 1 + · · · w2 M ɺλ ͸ਖ਼ͷύϥ ϝʔλ (ਖ਼ଇԽ߲ͱೋ৐ޡࠩͷ࿨ͷ߲ͷ૬ରతͳॏཁ౓Λௐઅ) ▶ ͜ͷޡࠩؔ਺Λ࢖༻͢Δͱɺύϥϝʔλ w ͷϊϧϜ͕େ͖͘ͳΒ ͳ͍Α͏ʹϑΟοςΟϯά͞ΕΔ 10 / 55
  11. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ্ͷਤ͸ɺM = 9 Ͱ (1.4) ͷޡࠩؔ਺Λ༻͍ͯɺϑΟοςΟϯά͠ ͨ݁Ռ

    (λ = e−18 ͱ λ = 1) ▶ λ = e−18 Ͱ͸ɺλ = 0 ͷ࣌ʹൺ΂ͯաֶश͕཈੍͞Ε͍ͯΔ͜ͱ ͕Θ͔Δ ▶ ·ͨɺλ = 1 Ͱ͸ (1.4) ͷӈล 2 ߲໨ͷॏཁ౓্͕͕Γ͗ͯ͢ɺύ ϥϝʔλ w⋆ ͷ੒෼͕ 0 ʹ͖͍ۙͮ͗ͯ͢Δ 11 / 55
  12. 1.1 ྫɿଟ߲ࣜۂઢϑΟοςΟϯά ▶ ͜͜·Ͱ͸ඇৗʹ௚ײతʹϑΟοςΟϯάͷٞ࿦Λߦ͖ͬͯͨ ▶ ͔͜͜Β͸֬཰࿦Λಋೖ͢Δ͜ͱͰɺΑΓ࿦ཧతʹύλʔϯೝࣝ ͷ໰୊Λղ͍͍ͯ͘ 12 / 55

  13. 1.2 ֬཰࿦ ▶ ύλʔϯೝࣝʹ͓͍ͯɺॏཁͳෆ࣮֬ੑΛఆྔతʹධՁ͢ΔͨΊ ʹ֬཰࿦Λಋೖ͢Δ ▶ ֬཰ม਺ X, Y Λߟ͑ɺ͜ΕΒ͸

    X = xi (i = 1, 2, · · · , M)ɺ Y = yj (j = 1, 2, · · · , L) ΛͱΔͱ͠ɺX = xi , Y = yj ͱͳΔ֬཰ (ಉ࣌֬཰) Λ p(X = xi , Y = yj ) ͱ͔͘ ▶ X = xi ͱͳΔ֬཰ p(X = xi ) ͸ɺp(X = xi , Y = yj ) Λ༻͍ͯҎ ԼͷΑ͏ʹ͔͚Δ (Ճ๏ఆཧ) p(X = xi ) = L ∑ j=1 p(X = xi , Y = yj ) (1.7) ▶ ·ͨɺX = xi ͕༩͑ΒΕ্ͨͰɺY = yj ͱͳΔ֬཰ (৚݅෇͖֬ ཰) Λ p(Y = yj |X = xi ) ͱ͢ΔͱɺҎԼͷΑ͏ͳؔ܎͕ࣜ੒ཱ͢ Δ (৐๏ఆཧ) p(X = xi , Y = yj ) = p(Y = yj |X = xi )p(X = xi ) (1.9) 13 / 55
  14. 1.2 ֬཰࿦ ▶ ৐๏ఆཧͱಉ࣌֬཰ͷରশੑ p(X, Y ) = p(Y, X)

    Λ༻͍ΔͱɺϕΠ ζͷఆཧ͕ಋ͚Δ p(Y |X) = p(X|Y )p(Y ) p(X) (1.12) ▶ ͜͜Ͱɺp(Y ) Λࣄલ֬཰ (X ͕༩͑ΒΕΔલͷ֬཰) ͱ͍͍ɺ p(Y |X) Λࣄޙ֬཰ (X ͕༩͑ΒΕͨޙͷ֬཰) ͱ͍͏ ▶ ϕΠζͷఆཧ͸ࣄલ֬཰ p(Y ) ʹ໬౓ p(X|Y ) Λ͔͚Δͱɺࣄޙ֬ ཰ p(X|Y ) ʹͳΔͱ͍͏͜ͱΛද͢ (p(X) ͸ p(Y |X) ͕ Y ʹର͠ ͯن֨Խ͞Ε͍ͯΔ͜ͱΛอূ͢Δن֨Խఆ਺) ▶ ͞Βʹɺಉ࣌෼෍ p(X, Y ) ͕ҎԼͷΑ͏ʹपล෼෍ͷੵͰදͤΔ ࣌ɺX ͱ Y ͸ಠཱͰ͋Δͱ͍͏ p(X, Y ) = p(X) p(Y ) 14 / 55
  15. 1.2.1 ֬཰ີ౓ ▶ ࣍ʹ࿈ଓతͳ֬཰ม਺ͷ෼෍ʹ͍ͭͯߟ͑Δ ▶ ֬཰ม਺ x ͕ (x, x

    + δx) ͷൣғʹೖΔ֬཰͕ δx → 0 ͷ࣌ʹ p(x) δx ͱ༩͑ΒΕΔ࣌ɺp(x) Λ֬཰ີ౓ͱ͍͏ ▶ ͜ͷ࣌ɺม਺ x ͕۠ؒ (a, b) ʹ͋Δ֬཰͸ҎԼͷࣜͰ༩͑ΒΕΔ p(x ∈ (a, b)) = ∫ b a p(x) dx (1.24) ▶ ·ͨɺ֬཰ͷඇෛੑͱن֨ԽΑΓɺp(x) ͸ҎԼͷੑ࣭Λ࣋ͭ p(x) ≥ 0 (1.25) ∫ ∞ −∞ p(x) dx = 1 (1.26) 15 / 55
  16. 1.2.1 ֬཰ີ౓ ▶ x ͕ (−∞, z) ͷൣғʹೖΔ֬཰͸ྦྷੵ෼෍ؔ਺ͱݺ͹ΕɺҎԼͷ Α͏ʹ͔͚Δ P(z)

    = ∫ z −∞ p(x) dx (1.28) ▶ Ճ๏ఆཧͱ৐๏ఆཧͷ࿈ଓม਺൛͸ҎԼͷΑ͏ʹͳΔ p(x) = ∫ p(x, y) dy (1.31) p(x, y) = p(y|x)p(x) (1.32) ▶ ࿈ଓม਺ͷՃ๏ఆཧͱ৐๏ఆཧΛݫີʹࣔ͢ʹ͸ଌ౓࿦͕ඞཁʹ ͳΔ͕ɺͦ͜ʹ͸ཱͪೖΒͳ͍ 16 / 55
  17. 1.2.2 ظ଴஋ͱ෼ࢄ ▶ ֬཰࿦Ͱͷॏཁͳܭࢉͱͯ͠ɺॏΈ෇͖ฏۉ͕͋Δ ▶ ࿈ଓతͳ֬཰ม਺ x ʹରͯ͠ɺؔ਺ f(x) ͷ֬཰෼෍

    p(x) ͷԼͰ ͷฏۉ஋͸ҎԼͷΑ͏ʹͳΔ E[f] = ∫ p(x)f(x) dx (1.34) ▶ ͜͜Ͱه๏ͱͯ͠ɺͲͷม਺ʹ͍ͭͯ࿨ (΋͘͠͸ੵ෼) Λͱͬͯ ͍Δͷ͔ΛఴࣈͰද͢͜ͱʹ͢Δɻྫ͑͹ɺҎԼͷྔ͸ x ͍ͭͯ ࿨ (΋͘͠͸ੵ෼) Λͱͬͨ΋ͷͰ͋Δ Ex [f(x, y)] (1.36) 17 / 55
  18. 1.2.2 ظ଴஋ͱ෼ࢄ ▶ ؔ਺ f(x) ͷ֬཰෼෍ p(x) ͷԼͰͷ෼ࢄ (ؔ਺ f(x)

    ͕ͦͷฏۉ஋ E[f(x)] ͷपΓͰͲΕ͚ͩόϥ͍͍ͭͯΔͷ͔Λද͢) var[f] = E [ (f(x) − E[f(x)])2 ] (1.38) ▶ 2 ͭͷ֬཰ม਺ x ͱ y ͷؒͷڞ෼ࢄ (2 ͭͷ֬཰ม਺ͷґଘੑΛද ͢) ͸ҎԼͷΑ͏ʹఆٛ͞ΕΔ cov[x, y] = Ex,y [ {x − E[x]}{y − E[y]} ] = Ex,y [xy] − E[x]E[y] (1.41) ▶ 2 ͭͷ֬཰ม਺ x ͱ y ͕ಠཱͷ࣌ɺcov[x, y] = 0 ͱͳΔ 18 / 55
  19. 1.2.3 ϕΠζ֬཰ ▶ ϕΠζ֬཰Λଟ߲ࣜۂઢϑΟοςΟϯάΛྫʹઆ໌͢Δ ▶ ϕΠζతͳ֬཰ղऍͰ͸ɺ·ͣσʔλΛ؍ଌ͢Δલʹɺզʑͷύϥ ϝʔλ w ΁ͷԾઆΛࣄલ֬཰ p(w)

    ͷܗͰऔΓࠐΜͰ͓͘ ▶ ࣮ࡍʹ؍ଌσʔλ D = {t1 , t2 , · · · , tN } Λ༻͍ͯ໬౓ؔ਺ p(D|w) ΛٻΊΔ ▶ ϕΠζͷఆཧΑΓɺࣄޙ֬཰ p(w|D) ΛٻΊΔ p(w|D) = p(D|w)p(w) p(D) (1.43) 19 / 55
  20. 1.2.3 ϕΠζ֬཰ ▶ ස౓ओٛతͳ֬཰ղऍͱϕΠζతͳ֬཰ղऍͰɺ໬౓ؔ਺ p(D|w) ͷ໾ׂ͕มΘΔ ▶ ස౓ओٛతͳ֬཰ղऍͰ͸ɺw ͸͋Δݻఆ͞Εͨύϥϝʔλͱ͠ ͯଊ͑ɺ໬౓ؔ਺

    p(D|w) Λ࠷େʹ͢ΔΑ͏ͳ w Λਪఆྔͱͯ͠ ఆΊΔ (w ͸ 1 ͭʹఆ·Δ) ▶ ϕΠζతͳ֬཰ղऍͰ͸ɺ໬౓ؔ਺͸ࣄલ෼෍Λ؍ଌσʔλ D ʹ Αͬͯɺࣄޙ෼෍ʹߋ৽͢ΔͨΊʹ࢖͏ (ࣄޙ෼෍ p(w|D) ͸ w ͷ ֬཰෼෍Ͱ͋Γɺw ͸ෆ࣮֬ੑΛ΋ͭ) 20 / 55
  21. 1.2.4 Ψ΢ε෼෍ ▶ ࿈ଓม਺ͷ֬཰෼෍Ͱ࠷΋ॏཁͳ෼෍Ͱ͋ΔΨ΢ε෼෍ʹ͍ͭͯ ड़΂Δ ▶ Ψ΢ε෼෍͸ҎԼͰఆٛ͞ΕΔ (ύϥϝʔλ͸ฏۉ µ ͱ෼ࢄ

    σ2 ͷ 2 ͭ) N(x|µ, σ2) = 1 (2πσ2)1/2 exp { − 1 2σ2 (x − µ)2 } (1.46) 21 / 55
  22. 1.2.4 Ψ΢ε෼෍ ▶ x ͷฏۉ஋ͱ x ͷ෼ࢄ͸ҎԼͷΑ͏ʹͳΔ E[x] = ∫

    ∞ −∞ N(x|µ, σ2) x dx = µ (1.49) var[x] = E[x2] − E[x]2 = σ2 (1.51) 22 / 55
  23. 1.2.4 Ψ΢ε෼෍ ▶ Ψ΢ε෼෍ʹΑΓಠཱʹੜ੒͞Εͨσʔλू߹ x = (x1 , x2 ,

    · · · , xN )T ͔ΒΨ΢ε෼෍ͷ µ, σ2 ΛٻΊΔ ▶ σʔλू߹͸ͦΕͧΕಠཱͳͷͰɺ໬౓ؔ਺͸ҎԼͷΑ͏ʹͳΔ p(x|µ, σ2) = N ∏ n=1 N(xn |µ, σ2) (1.53) ▶ (1.53) Λ࢖ͬͯ µ, σ2 ΛٻΊΔ୅ΘΓʹҎԼͷର਺໬౓ؔ਺Λ࢖ͬ ͯ µ, σ2 ΛٻΊΔ ln p(x|µ, σ2) = − 1 2σ2 N ∑ n=1 (xn − µ)2 − N 2 ln σ2 − N 2 ln(2π) (1.54) ▶ ͜ͷର਺໬౓Λ࠷େʹ͢Δ µ, σ2 ͸ҎԼͷΑ͏ʹͳΔ µML = 1 N N ∑ n=1 xn (1.55) σ2 ML = 1 N N ∑ n=1 (xn − µML )2 (1.56) 23 / 55
  24. 1.2.4 Ψ΢ε෼෍ ▶ ͜͜Ͱ࠷໬Ξϓϩʔνͷݶքʹ͍ͭͯड़΂Δ ▶ µML ͱ σ2 ML ͸σʔλ఺ͷू߹

    x1 , x2 , · · · , xN ͷؔ਺Ͱ͋Γɺ µML ͱ σ2 ML ͷύϥϝʔλ µ, σ2 Λ࣋ͭΨ΢ε෼෍ (σʔλ఺ͷू ߹ x1 , x2 , · · · , xN Λੜ੒͢Δ෼෍) Ͱͷظ଴஋ΛٻΊΔͱҎԼͷΑ ͏ʹͳΔ E[µML ] = µ (1.57) E[σ2 ML ] = ( N − 1 N ) σ2 (1.58) ▶ µML ͷظ଴஋͸ਅͷ஋ͱ౳͍͕͠ɺ෼ࢄ͕ (N − 1)/N ഒʹաখධ Ձ͞Ε͍ͯΔ ▶ (1.58) ΑΓɺҎԼͷ ˜ σ2 ͕ෆภਪఆྔ (E[˜ σ2] = σ2) ͱͳΔ ˜ σ2 = ( N N − 1 ) σ2 ML = 1 N − 1 N ∑ n=1 (xn − µML )2 (1.59) 24 / 55
  25. 1.2.5 ۂઢϑΟοςΟϯά࠶๚ ▶ 1.1 અͰߦͳͬͨۂઢϑΟοςΟϯάΛ֬཰తͳ؍఺͔Βٞ࿦ͯ͠ ΈΔ ▶ ೖྗม਺ x ʹରԠ͢Δ

    t ͸ɺฏۉ͕ y(x, w) Ͱ͋ΔҎԼͷΨ΢ε෼ ෍ʹै͏ͱ͢Δ (β−1 = σ2) p(t|x, w, β) = N(t|y(x, w), β−1) (1.60) ▶ ܇࿅σʔλ (x, t) Λ༻͍ͯɺҎԼͷ໬౓ؔ਺Λ࠷େʹ͢ΔΑ͏ͳύ ϥϝʔλ w, β ΛٻΊΔ p(t|x, w, β) = N ∏ n=1 N(tn |y(xn , w), β−1) (1.61) ▶ ࣮ࡍͷܭࢉͰ͸ɺҎԼͷର਺໬౓Λ࠷େʹ͢ΔΑ͏ͳ w, β Λٻ ΊΔ ln p(t|x, w, β) = − β 2 N ∑ n=1 (y(xn , w) − tn )2 + N 2 ln β − N 2 ln(2π) (1.62) 25 / 55
  26. 1.2.5 ۂઢϑΟοςΟϯά࠶๚ ▶ ·ͣɺln p(t|x, w, β) Λ࠷େʹ͢Δ wML Λߟ͑Δ

    ▶ (1.62) ͷӈล 2 ߲໨Ҏ߱͸ w ʹدΒͳ͍ͷͰແࢹͰ͖ɺβ ͸ਖ਼ͳ ͷͰ β = 1 ͱͯ͠΋ wML ͷ஋͸มΘΒͳ͍ ▶ ln p(t|x, w, β) Λ࠷େʹ͢Δ͜ͱ͸ − ln p(t|x, w, β) Λ࠷খʹ͢Δ ͜ͱͱ౳ՁͳͷͰɺwML ΛٻΊΔ͜ͱ͸ (1.2) Λ࠷খʹ͢Δ w Λ ٻΊΔ͜ͱͱ౳ՁͰ͋Δ E(w) = 1 2 N ∑ n=1 {y(xn , w) − tn }2 (1.2) ▶ ೋ৐࿨ޡࠩؔ਺͸ϊΠζ͕Ψ΢ε෼෍ʹै͏ͱ͍͏ԾఆͷԼͰͷ ໬౓ͷ࠷େԽͷ݁ՌͱΈͳͤΔ ▶ ͞Βʹ βML ͸ҎԼͷΑ͏ʹٻΊΒΕΔ 1 βML = 1 N N ∑ n=1 (y(xn , wML ) − tn )2 (1.63) 26 / 55
  27. 1.2.5 ۂઢϑΟοςΟϯά࠶๚ ▶ ࣍͸ϕΠζతͳΞϓϩʔνΛߦ͏ͨΊʹɺҎԼͷࣄલ෼෍Λಋೖ ͢Δ p(w|α) = N(w|0, α−1I) =

    ( α 2π )(M+1)/2 exp { − α 2 wT w } (1.65) ▶ ϕΠζͷఆཧΑΓɺࣄޙ෼෍ p(w|x, t, α, β) ͸໬౓ؔ਺ͱࣄલ෼෍ Λ༻͍ͯҎԼͷΑ͏ʹͳΔ p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α) (1.66) ▶ ͜ͷࣄޙ෼෍ p(w|x, t, α, β) Λ࠷େʹ͢Δ w Λݟ͚ͭΔ͜ͱ͸Ҏ ԼͷྔΛ࠷খʹ͢Δ w Λݟ͚ͭΔ͜ͱͱ౳Ձ β 2 N ∑ n=1 {y(xn , w) − tn }2 + α 2 wT w (1.67) ▶ ͜ΕΑΓɺ(1,67) ͸ਖ਼ଇԽ͞Εͨೋ৐࿨ޡࠩ (1.4) ͱ౳Ձ ▶ ϕΠζతͳΞϓϩʔνͰ͸ɺ(ࣄલ෼෍Λ͋Β͔͡ΊઃఆͰ͖Δ͓ ͔͛Ͱ) աֶशΛ཈੍͢Δ͜ͱ͕Ͱ͖Δ 27 / 55
  28. 1.2.6 ϕΠζۂઢϑΟοςΟϯά ▶ 1.2.5 Ͱ͸ɺࣄޙ෼෍ͷ఺ਪఆΛߦ͍ͬͯΔ͚ͩͳͷͰɺ׬શͳϕ ΠζతͳΞϓϩʔνͱ͸ݴ͑ͳ͍ ▶ Ճ๏ఆཧͱ৐๏ఆཧΛ༻͍Δͱɺະ஌ͷೖྗσʔλ x ͕༩͑ΒΕ

    ͨ࣌ͷग़ྗ t ͷ༧ଌ෼෍ p(t|x, x, t) ͸ҎԼͷΑ͏ʹͳΔ p(t|x, x, t) = ∫ p(t|x, w) p(w|x, t) dw (1.68) ▶ (1.60) ͱ (1.66) ͷӈลΛن֨Խͨ͠΋ͷΛ༻͍Δͱɺ(1.68) ͷੵ෼ ͸࣮ߦͰ͖ͯɺҎԼͷΑ͏ʹͳΔ p(t|x, x, t) = N(t|m(x), s2(x)) (1.69) 28 / 55
  29. 1.2.6 ϕΠζۂઢϑΟοςΟϯά ▶ ͜͜Ͱɺฏۉͱ෼ࢄ͸ҎԼͷΑ͏ʹͳΔ m(x) = β ϕ(x)T S N

    ∑ n=1 ϕ(xn )tn (1.70) s2(x) = β−1 + ϕ(x)T Sϕ(x) (1.71) ▶ S ͸ҎԼͰఆٛ͞ΕΔ S−1 = αI + β N ∑ n=1 ϕ(xn )ϕ(xn )T (1.72) ▶ p(t|x, x, t) ͸ฏۉͱ෼ࢄ͕ೖྗ x ʹґଘͨ͠Ψ΢ε෼෍ʹͳΔ͜ ͱ͕Θ͔Δ (۩ମతͳಋग़͸ 3.3 અͰ) 29 / 55
  30. 1.3 Ϟσϧબ୒ ▶ ൚Խੑೳ͕͋ΔϞσϧΛબͿʹ͸ͲͷΑ͏ʹ͢Ε͹͍͍ͷ͔Λߟ ͑Δ ▶ ଟ߲ࣜۂઢϑΟοςΟϯάͷྫͰ͸ɺ࣍਺ M ΍ਖ਼ଇύϥϝʔλ λ

    ΛܾΊΔ͜ͱ͕Ϟσϧͷબ୒ʹ͋ͨΔ ▶ Ϟσϧબ୒ͷύϥϝʔλ (ϋΠύʔύϥϝʔλ) ΛͲͷ஋ʹ͢Ε͹ ྑ͍ͷ͔ΛܾΊΔͨΊʹ࢖༻͢Δσʔλͷू߹ΛόϦσʔγϣ ϯσʔλͱ͍͏ ▶ ͨͩ͠ɺϋΠύʔύϥϝʔλΛม͑ͳ͕ΒόϦσʔγϣϯΛߦ͍ɺ ͋Δύϥϝʔλͷ࣌ʹྑ͍ਫ਼౓͕ग़ͨͱͯ͠΋ɺͦΕ͸୯ʹόϦ σʔγϣϯσʔλʹରͯ͠աֶशΛى͍ͯ͜͠Δ͚͔ͩ΋͠Ε ͳ͍ ▶ ैͬͯɺόϦσʔγϣϯσʔλʹΑͬͯબ୒͞ΕͨϞσϧ (ϋΠ ύʔύϥϝʔλ) ͷਫ਼౓ΛςετσʔλΛ࢖ͬͯଌΔඞཁ͕͋Δ 30 / 55
  31. 1.3 Ϟσϧબ୒ ▶ Ҏ্ͷํ๏Ͱ͸ɺશσʔλΛ܇࿅σʔλͱόϦσʔγϣϯσʔλ ͱςετσʔλͷ 3 ͭʹ෼͚ͳ͚Ε͹ͳΒͳ͍͕ɺͰ͖Δ͚ͩ σʔλ͸܇࿅σʔλʹׂ͖͍ͨ ▶ ༩͑ΒΕͨશσʔλͷ಺ɺ(S

    − 1)/S Λֶशʹ࢖͍ɺ1/S ΛόϦ σʔγϣϯʹ࢖͏ํ๏͕͋Δ (ΫϩεόϦσʔγϣϯ) ▶ ͜ΕΛ࢖͑͹ɺଟ͘ͷσʔλΛֶशʹ࢖͑ͯɺόϦσʔγϣϯ΋Ͱ ͖Δ ▶ Լͷਤ͸ S = 4 ͷ࣌ͷΫϩεόϦσʔγϣϯ 31 / 55
  32. 1.4 ࣍ݩͷढ͍ ▶ ࣍ʹɺߴ࣍ݩσʔλͷऔΓѻ͍ͷ೉͠͞Λݟ͍ͯ͘ ▶ ·ͣɺ1.1 અͰ͸ೖྗม਺ x ͸Ұ࣍ݩ͕ͩͬͨɺ͜͜Ͱ͸ೖྗΛ D

    ࣍ݩͷϕΫτϧ x ʹͯ͠ΈΑ͏ ▶ ͜ͷ࣌ɺ3 ࣍·Ͱͷଟ߲ࣜ͸ҰൠతʹҎԼͷΑ͏ʹ͔͚Δ y(x, w) = w0 + D ∑ i=1 wi xi + D ∑ i=1 D ∑ j=1 wij xi xj + D ∑ i=1 D ∑ j=1 D ∑ k=1 wijk xi xj xk (1.74) ▶ ಠཱͳ܎਺ͷ਺͸ D3 ʹൺྫ͍ͯͯ͠ɺ͜Ε͕ M ࣍ଟ߲ࣜͰ͸ɺ ಠཱͳ܎਺ͷ਺͸ DM ʹൺྫ͢Δ ▶ D ͕େ͖͘ͳΔͱɺࢦ਺ؔ਺૿Ճ΄ͲͰ͸ͳ͍͕ɺಠཱͳ܎਺ͷ ਺͸ඇৗʹେ͖͘ͳͬͯɺਪఆ͕ࠔ೉ʹͳΔ 32 / 55
  33. 1.4 ࣍ݩͷढ͍ ▶ ·ͨ 3 ࣍ݩۭؒͷزԿֶతͳ௚ײΛߴ࣍ݩʹͦͷ··͍࣋ͬͯ͘ ͜ͱ͸Ͱ͖ͳ͍ྫΛڍ͛Δ ▶ D ࣍ݩͷ൒ܘ

    r = 1 ͷٿΛߟ͑ɺr = 1 − ϵ ͱ r = 1 ͷؒͷٿ֪ͷ ମੵͱશମੵͷൺΛٻΊΔ ▶ ൒ܘ r ͷ D ࣍ݩ௒ٿͷମੵ VD (r) ͸ rD ʹൺྫ͍ͯ͠ΔͷͰɺൺ ͸ҎԼͷΑ͏ʹͳΔ VD (1) − VD (1 − ϵ) VD (1) = 1 − (1 − ϵ)D (1.76) 33 / 55
  34. 1.4 ࣍ݩͷढ͍ ▶ Լͷਤ͸͍ΖΜͳ D ͰൺΛϓϩοτͨ͠΋ͷ ▶ େ͖ͳ࣍ݩ D Ͱ͸ɺମੵ͸ٿͷද໘෇ۙʹूத͍ͯ͠Δ͜ͱ͕Θ

    ͔Δ 34 / 55
  35. 1.5 ܾఆཧ࿦ ▶ ֬཰࿦ʹΑΓɺෆ࣮֬ੑͷఆྔԽΛߦ͏͜ͱ͕Ͱ͖Δ͜ͱΛΈͨ ▶ ະ஌ͷೖྗσʔλ x ͱ໨ඪϕΫτϧ t ͷಉ࣌֬཰෼෍

    p(x, t) ͸܇ ࿅σʔλ͔Βਪଌ͢Δ͜ͱ͕Ͱ͖Δ ▶ ܾఆཧ࿦͸ɺ͜ͷ p(x, t) Λ༻͍ͯͲͷΑ͏ʹ໨ඪϕΫτϧ t Λܾ ఆ͢Ε͹ྑ͍ͷ͔ͱ͍͏ɺܾఆํ๏Λ༩͑Δ 35 / 55
  36. 1.5.1 ޡࣝผ཰ͷ࠷খԽ ▶ ઌͣ͸୯ʹޡࣝผͷ਺ΛݮΒ͢͜ͱ͚ͩΛߟ͑Δ ▶ ͦͯ͠ɺग़ྗม਺ t ͸ 2 ஋෼ྨͱ͠ɺt

    = 0 ͷ࣌ΛΫϥε C1 ɺt = 1 ͷ࣌ΛΫϥε C2 ͱ͢Δ ▶ ͞ΒʹɺೖྗϕΫτϧۭؒΛ R1 ͱ R2 ʹ෼ׂ͠ɺRk ্ͷ఺͸શ ͯΫϥε Ck ʹׂΓ౰ͯΔ (Rk ΛܾఆྖҬͱ͍͏) ▶ R1 ্ͷೖྗϕΫτϧΛΫϥε C2 ʹׂΓ౰ͯͯ͠·ͬͨΓɺR2 ্ ͷೖྗϕΫτϧΛΫϥε C1 ׂΓ౰ͯͯ͠·͏֬཰͸ҎԼͷΑ͏ʹ ͳΔ p(ޡΓ) = p(x ∈ R1 , C2 ) + p(x ∈ R2 , C1 ) = ∫ R1 p(x, C2 ) dx + ∫ R2 p(x, C1 ) dx (1.78) 36 / 55
  37. 1.5.1 ޡࣝผ཰ͷ࠷খԽ ▶ p(ޡΓ) Λ࠷খԽ͢Δʹ͸ɺ(1.78) ͷӈลͷੵ෼Λ࠷খʹ͢Ε͹ ྑ͍ ▶ ͭ·Γɺྫ͑͹͋Δ x

    ʹରͯ͠ɺp(x, C1 ) > p(x, C2 ) ͳΒɺx ͷΫ ϥε͸ C1 ʹׂΓ౰ͯΔ΂͖ ▶ ֬཰ͷ৐๏ఆཧΑΓ p(x, Ck ) = p(Ck |x)p(x) ͱͳΔͷͰɺp(ޡΓ) Λ࠷খʹ͢Δʹ͸ɺࣄޙ෼෍ p(Ck |x) ͕࠷େʹͳΔΫϥεʹׂΓ౰ ͯΔ࣌Ͱ͋Δ 37 / 55
  38. 1.5.1 ޡࣝผ཰ͷ࠷খԽ ▶ Լਤ͕άϥϑΛ༻͍ܾͨఆྖҬͷܾΊํͷྫ 38 / 55

  39. 1.5.2 ظ଴ଛࣦͷ࠷খԽ ▶ ଟ͘ͷ৔߹ɺ୯ʹޡࣝผͷ਺ΛݮΒ͚ͩ͢Ͱ͸μϝͳ৔߹͕͋Δ ▶ ྫ͑͹ɺ਍࡯݁Ռ͔Β؞͔Ͳ͏͔Λ൑ఆ͢Δ৔߹ɺຊ౰͸؞Ͱͳ͍ ਓΛ؞ͱ൑ఆͯ͠͠·͏ޡࣝผͷ਺ΑΓ΋ɺຊ౰͸؞ͳͷʹ؞Ͱ ͸ͳ͍ͱ൑ఆͯ͠͠·͏ޡࣝผͷ਺ΛݮΒ͢΂͖Ͱ͋Δ ▶ Ҏ্ͷ͜ͱΛߟྀ͢ΔͨΊɺҎԼͷଛࣦؔ਺Λ࠷খʹ͢ΔΑ͏ʹ

    ܾఆྖҬ Rk ΛܾΊΔ E[L] = ∑ k ∑ j ∫ Rj Lkj p(x, Ck ) dx (1.80) ▶ ͜͜Ͱग़ͯ͘Δ Lkj ͸ଛࣦߦྻͷ੒෼Ͱɺجຊతʹ͸ Lii = 0 ͱ ͠ɺΫϥε Cl ͱ൑ఆ͢Δ͜ͱ͕ඇৗʹϦεΫ͕͋Δ࣌ (؞ͷྫͩ ͱʮ؞Ͱ͸ͳ͍ʯΫϥε)ɺLlj (j ̸= l) ͷ஋Λେ͖͘͢ΔͱɺϦε Ϋͷ͋Δޡࣝผͷ਺ΛݮΒ͢͜ͱ͕Ͱ͖Δ 39 / 55
  40. 1.5.2 ظ଴ଛࣦͷ࠷খԽ ▶ (1.80) Λ࠷খʹ͢Δͱ͍͏͜ͱ͸ɺͦΕͧΕͷ x ʹରͯ͠ɺΫϥε ͷݸ਺෼͋Δྔ { ∑

    k Lkj p(x, Ck )} ͷத͔Β࠷΋খ͍͞Ϋϥε j ʹ x ΛׂΓ౰ͯΔ͜ͱʹͳΔ ▶ ৐๏ఆཧΑΓ p(x, Ck ) = p(Ck |x)p(x) ͱͳΔͷͰɺҎԼͷྔ͕࠷খ ͱͳΔΫϥε j ʹ x ΛׂΓ౰ͯΔ ∑ k Lkj p(Ck |x) (1.81) 40 / 55
  41. 1.5.3 غ٫Φϓγϣϯ ▶ ࣄޙ෼෍ p(Ck |x) ͷ࠷େ஋͕ 1 ΑΓ΋͔ͳΓখ͍࣌͞͸ɺͲͷΫϥ εʹଐ͢Δ͔Λܾఆ͠ͳ͍΄͏͕ྑ͍৔߹͕͋Δ

    ▶ ͦ͜Ͱɺᮢ஋ θ (0 ≤ θ ≤ 1) Λ༻ҙ͠ɺp(Ck |x) ≤ θ ͱͳΔΑ͏ͳೖ ྗ x ͸ഁغ͢Δ (ΫϥεͷܾఆΛ͠ͳ͍) ͱ͢Δ͜ͱ΋͋Δ 41 / 55
  42. 1.5.4 ਪ࿦ͱܾఆ ܾఆ໰୊Λղͨ͘Ίͷ 3 ͭͷҟͳΔΞϓϩʔνʹ͍ͭͯड़΂Δ (a) Ϋϥεͷ৚݅෇͖ີ౓ p(x|Ck ) ͱΫϥεͷࣄલ෼෍

    p(Ck ) ΛٻΊɺ ϕΠζͷఆཧΛ༻͍ͯ p(Ck |x) ΛٻΊɺܾఆཧ࿦Λ༻͍ܾͯఆ͢Δ p(Ck |x) = p(x|Ck )p(Ck ) p(x) (1.82) (b) ௚઀ p(Ck |x) ΛٻΊɺܾఆཧ࿦Λ༻͍ܾͯఆ͢Δ (c) ͦΕͧΕͷೖྗϕΫτϧ x ͔ΒΫϥεϥϕϧʹࣸ૾͢Δؔ਺ f(x) Λݟ͚ͭΔ (ྫ͑͹ɺf = 0 ͸ C1 Ͱ f = 1 ͸ C2 ʹରԠͱ͔) 42 / 55
  43. 1.5.4 ਪ࿦ͱܾఆ ▶ (c) ͷํ๏͸؆୯͕ͩɺଛࣦߦྻ͕࢖͑ͳ͔ͬͨΓɺغ٫Φϓγϣ ϯ͕࢖͑ͳ͔ͬͨΓ͢ΔͷͰɺࣄޙ෼෍ΛٻΊΔํ๏ΛͱΓ͍ͨ 43 / 55

  44. 1.5.5 ճؼͷͨΊͷଛࣦؔ਺ ▶ ͜Ε·ͰΫϥε෼ྨͷܾఆཧ࿦ʹ͍ͭͯߟ͖͕͑ͯͨɺࠓ౓͸ճ ؼ໰୊ͷܾఆཧ࿦Λߟ͑Δ ▶ ͦͷͨΊɺ1.1 ͷۂઢϑΟοςΟϯάͷྫΛߟ͑Δ ▶ ͦ͜Ͱɺଛࣦ

    L(t, y(x)) = {y(x) − t}2 Λ༻͍ͯҎԼͷΑ͏ͳଛࣦ ؔ਺Λ࠷খԽ͢ΔΑ͏ͳ y(x) ΛٻΊΔ E[L] = ∫ ∫ {y(x) − t}2 p(x, t) dx dt (1.87) 44 / 55
  45. 1.5.5 ճؼͷͨΊͷଛࣦؔ਺ ▶ E[L] Λ y(x) Ͱม෼͠ɺͦΕΛ 0 ͱ͓͘ δE[L]

    δy(x) = 2 ∫ {y(x) − t} p(x, t) dt = 0 (1.88) ▶ (1.88) Λຬͨ͢ y(x) ͸ҎԼͷΑ͏ʹͳΔ y(x) = ∫ t p(t|x) dt = Et [t|x] (1.89) ▶ ԼͷਤͷΑ͏ʹɺx0 ͕༩͑ΒΕͨΒɺy(x0 ) = Et [t|x0 ] ͱͳΔΑ͏ ʹ y(x0 ) ΛܾΊΔ 45 / 55
  46. 1.6 ৘ใཧ࿦ ▶ ύλʔϯೝࣝ΍ػցֶशͰ༗༻ͳ৘ใཧ࿦ͷ֓೦Λ͍͔ͭ͘ಋೖ ͢Δ ▶ ઌͣ͸֬཰ม਺ x Λ؍ଌͨ࣌͠ʹͲͷ͘Β͍ͷ৘ใྔ͕͋Δ͔Λ ߟ͑ΔͱɺΑ͘ى͜Δࣄ৅͸͋·Γ৘ใ͸ͳ͍͕ɺ͋·Γى͜Βͳ

    ͍ࣄ৅ͷ৘ใྔ͸ଟ͍ͱߟ͑ΒΕΔ ▶ Αͬͯɺ৘ใྔ h(x) ͸֬཰ p(x) ʹґଘ͓ͯ͠Γɺ2 ͭͷಠཱͳࣄ ৅ͷ৘ใྔ h(x, y) ͸ h(x, y) = h(x) + h(y) ͱͳ͍ͬͯΔͰ͋Ζ͏ ͜ͱͱɺp(x, y) = p(x)p(y) Ͱ͋Δ͜ͱΛ༻͍Δͱ h(x) ͸ҎԼͷΑ ͏ʹͳΔ͜ͱ͕Θ͔Δ h(x) = − log2 p(x) (1.92) ▶ ৘ใྔͷظ଴஋ΛΤϯτϩϐʔͱఆٛ͢Δ H[x] = − ∑ x p(x) log2 p(x) (1.93) 46 / 55
  47. 1.6 ৘ใཧ࿦ ▶ (1.92) ͷର਺ͷఈͷબͼํʹ͸ࣗ༝౓͕͋ΓɺҎ߱͸ఈΛ e ͱ͢Δ ▶ ͞Βʹɺ཭ࢄม਺Λ xi

    ͱ͠ɺ֬཰Λ p(xi ) ͱॻ͘ͱΤϯτϩϐʔ ͸ҎԼͷΑ͏ʹͳΔ H[p] = − ∑ i p(xi ) ln p(xi ) (1.98) ▶ ҎԼͷਤ͕Τϯτϩϐʔͷੑ࣭Λද͢ਤͰ͋ΓɺࠨͷΑ͏ʹগͳ ͍஋ͰӶ͍ϐʔΫΛ͍࣋ͬͯΔΑ͏ͳ෼෍ʹରͯ͠͸Τϯτϩ ϐʔ͸খ͘͞ɺҰํӈͷΑ͏ʹͨ͘͞Μͷ஋ʹ޿͕͍ͬͯΔΑ͏ ͳ෼෍ʹରͯ͠͸Τϯτϩϐʔ͸େ͖͍ 47 / 55
  48. 1.6 ৘ใཧ࿦ ▶ ࿈ଓͳ֬཰ม਺ʹରͯ͠ɺΤϯτϩϐʔ͸ҎԼͷΑ͏ʹఆٛ͞ΕΔ H[p] = − ∫ p(x) ln

    p(x) dx (1.103) ▶ ͜ͷΤϯτϩϐʔ (1.103) ΛҎԼͷ৚݅ԼͰ࠷େʹ͢Δ֬཰෼෍ p(x) ΛٻΊΔ͜ͱΛߟ͑Δ ∫ ∞ −∞ p(x) dx = 1 (1.105) ∫ ∞ −∞ x p(x) dx = µ (1.106) ∫ ∞ −∞ (x − µ)2 p(x) dx = σ2 (1.107) 48 / 55
  49. 1.6 ৘ใཧ࿦ ▶ ϥάϥϯδϡ৐਺๏ΑΓɺҎ্ͷྔΛ p(x) Ͱม෼ͨ͠΋ͷΛ 0 ʹ ͯ͠ɺλi Ͱภඍ෼ͨ͠΋ͷΛ

    0 ͱ͓͍ͨ΋ͷΛղ͘ − ∫ p(x) ln p(x) dx + λ1 ( ∫ ∞ −∞ p(x) dx − 1 ) + λ2 ( ∫ ∞ −∞ x p(x) dx − µ ) + λ3 ( ∫ ∞ −∞ (x − µ)2 p(x) dx − σ2 ) ▶ ղ͸ҎԼͷΑ͏ʹΨ΢ε෼෍ʹͳΔ p(x) = 1 (2πσ2)1/2 exp ( − (x − µ)2 2σ2 ) (1.109) ▶ Ψ΢ε෼෍ͷΤϯτϩϐʔ͸ҎԼͷΑ͏ʹͳΓɺ෼ࢄ͕େ͖͍ (ό ϥ͍ͭͨ෼෍) ͷ࣌͸Τϯτϩϐʔ͕େ͖͍͜ͱ͕Θ͔Δ H[] = 1 2 {1 + ln (2πσ2)} (1.110) 49 / 55
  50. 1.6 ৘ใཧ࿦ ▶ ҎԼͷ৚݅෇͖ΤϯτϩϐʔΛఆٛ͢Δ H[y|x] = − ∫ ∫ p(y,

    x) ln p(y|x) dy dx (1.111) ▶ ֬཰ͷ৐๏ఆཧΛ༻͍Δͱɺ৚݅෇͖Τϯτϩϐʔ͸ҎԼͷؔ܎ Λຬͨ͢͜ͱ͕Θ͔Δ H[x, y] = H[y|x] + H[x] (1.112) 50 / 55
  51. 1.6.1 ૬ରΤϯτϩϐʔͱ૬ޓ৘ใྔ ▶ ͋Δະ஌ͷ෼෍ p(x) ͕͋Γɺۙࣅతʹ q(x) ͰϞσϧԽͨ͠ͱͨ͠ ࣌ͷ͜ΕΒͷ෼෍͕ͲΕ͚ͩࣅ͍ͯΔ͔ΛଌΔΧϧόοΫ-ϥΠϒ ϥʔμΠόʔδΣϯεΛಋೖ͢Δ

    KL(p ∥ q) = − ∫ p(x) ln q(x) dx − ( − ∫ p(x) ln p(x) dx ) = − ∫ p(x) ln { q(x) p(x) } dx (1.113) ▶ ΧϧόοΫ-ϥΠϒϥʔμΠόʔδΣϯε͸ KL(p ∥ q) ≥ 0 ͱͳΓɺ ͳ͓͔ͭ౳͕ࣜ੒ཱ͢Δͷ͸ p(x) = q(x) ͷ͚࣌ͩͰ͋Δ͜ͱΛ ࣔ͢ 51 / 55
  52. 1.6.1 ૬ରΤϯτϩϐʔͱ૬ޓ৘ใྔ ▶ ͦ͜Ͱɺತؔ਺ͱ͍͏֓೦Λಋೖ͢Δ 52 / 55

  53. 1.6.1 ૬ରΤϯτϩϐʔͱ૬ޓ৘ใྔ ▶ ತؔ਺ f(x) ʹ͸೚ҙͷ఺ू߹ {xi } ʹରͯ͠ҎԼͷΠΣϯηϯͷ ෆ౳͕ࣜ੒ཱ͢Δ

    f ( M ∑ i=1 λi xi ) ≤ M ∑ i=1 λi f(xi ) (1.115) ▶ ͜͜Ͱɺλi ≥ 0 Ͱ ∑ i λi = 1 Ͱ͋Δ ▶ λi Λ஋ xi ΛͱΔ֬཰ͱ͢Δͱɺ(1.115) ͸ҎԼͷΑ͏ʹͳΔ f(E[x]) ≤ E[f(x)] (1.116) ▶ (1.116) Λ࿈ଓม਺ʹରͯ͠దԠ͢ΔͱҎԼͷΑ͏ʹͳΔ f ( ∫ x p(x) dx ) ≤ ∫ f(x) p(x) dx (1.117) 53 / 55
  54. 1.6.1 ૬ରΤϯτϩϐʔͱ૬ޓ৘ใྔ ▶ ͜ͷΠΣϯηϯͷෆ౳ࣜΛΧϧόοΫ-ϥΠϒϥʔμΠόʔδΣϯ εʹదԠ͢ΔͨΊʹ (1.115) ʹ໭Γɺ λi Λ஋ zi

    ΛऔΔ࣌ͷ֬཰ͱ ͠ɺxi = ξ(zi ) ͱ͢Δͱɺ(1.115) ͷ࿈ଓ൛͸ҎԼͷΑ͏ʹͳΔ f ( ∫ ξ(x) p(x) dx ) ≤ ∫ f(ξ(x)) p(x) dx ▶ ্ͷࣜͰɺf(x) = − ln x, ξ(x) = q(x)/p(x) ͱ͢ΔͱɺҎԼͷΑ͏ ʹ KL(p ∥ q) ≥ 0 ͕ࣔͤΔ KL(p ∥ q) = − ∫ p(x) ln { q(x) p(x) } dx ≥ − ln ∫ q(x) dx = 0 (1.118) 54 / 55
  55. 1.6.1 ૬ରΤϯτϩϐʔͱ૬ޓ৘ใྔ ▶ ࠷ޙʹ 2 ͭͷ֬཰ม਺͕ͲΕ͚ͩಠཱʹ͍ۙͷ͔ΛΧϧόοΫ-ϥ ΠϒϥʔμΠόʔδΣϯεΛ༻͍ͯධՁ͢Δɻ ▶ ͭ·ΓɺҎԼͷಉ࣌֬཰ p(x,

    y) ͱपล֬཰ͷੵ p(x)p(y) ͷؒͷ ΧϧόοΫ-ϥΠϒϥʔμΠόʔδΣϯε I[x, y] Λߟ͑Δ I[x, y] = − ∫ ∫ p(x, y) ln { p(x)p(y) p(x, y) } dx dy (1.120) ▶ I[x, y] ΛΤϯτϩϐʔΛ༻͍ͯॻ͘ͱҎԼͷΑ͏ʹͳΔ I[x, y] = H[x] − H[x|y] = H[y] − H[y|x] (1.121) ▶ ͜ΕΑΓɺI[x, y] ͸ y Λ஌Δ͜ͱʹΑͬͯɺx ͷෆ࣮֬ੑ͕Ͳͷ͘ Β͍ݮΔͷ͔Λද͍ͯ͠Δ (ٯ΋ಉ͡) 55 / 55