Save 37% off PRO during our Black Friday Sale! »

PRMLセミナー

370e1dde1ef2391bdebe02e4a777890e?s=47 gucchi
March 20, 2019

 PRMLセミナー

370e1dde1ef2391bdebe02e4a777890e?s=128

gucchi

March 20, 2019
Tweet

Transcript

  1. ୈ 2 ճ PRML ηϛφʔ 2019/03/25 ࡔޱ ྒี 1 /

    74
  2. 0. ࠓճͷηϛφʔʹ͍ͭͯ ࠓճͷηϛφʔͰ͸ɺPRML ͷୈ 6 ষͷΧʔωϧ๏ͱୈ 7 ষͷૄͳղ Λ࣋ͭΧʔωϧϚγʔϯΛ͓࿩͍ͨ͠͠ͱࢥ͍·͢ɻ ·ͨɺ͜ΕΒͷ࿩୊Λઆ໌͢ΔͨΊʹඞཁͳ༧උ஌ࣝΛղઆ͠·͢ɻ

    (PRML ͷୈ 3 ষͷઢܗճؼϞσϧͱ 4.1.1 ͷ఺ͱ௒ฏ໘ͷڑ཭) ͳ͓஫ҙ఺ͱͯ͠ɺຊεϥΠυͷࣜ൪߸ͱ PRML ͷࣜ൪߸͸ҟͳΓ· ͢ͷͰɺ͝஫ҙ͍ͩ͘͞ɻ 2 / 74
  3. ໨࣍ 1. ༧උ஌ࣝ 1-1. ઢܗճؼϞσϧ 1-2. ఺ͱ௒ฏ໘ͷڑ཭ 2. Χʔωϧ๏ 2-1.

    ૒ରදݱ 2-2. Χʔωϧؔ਺ͷߏ੒ 2-3. Ψ΢εաఔʹΑΔճؼ 2-4. Ψ΢εաఔʹΑΔ෼ྨ 3. ૄͳղΛ࣋ͭΧʔωϧϚγʔϯ 3-1. ࠷େϚʔδϯ෼ྨث 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ 3-3. ճؼͷͨΊͷ SVM 3-4. ճؼͷͨΊͷ RVM 3-5. ෼ྨͷͨΊͷ RVM 3-5. ෼ྨͷͨΊͷ RVM 3 / 74
  4. 1-1. ઢܗճؼϞσϧ ճؼ໰୊ͷجૅͱͳΔઢܗճؼϞσϧͷ֓ཁͷઆ໌Λ؆୯ʹߦ͏ɻ (PRML 3.1.1) ·ͣɺD ࣍ݩೖྗϕΫτϧΛ xɺֶशύϥϝʔλΛ w =

    (w0 , w1 , · · · , wM−1 )T ͱ͠ɺؔ਺ y(x, w) Λඇઢܗͳجఈؔ਺ ϕj (x) (j = 1, · · · , M − 1) ͰҎԼͷΑ͏ʹల։͢Δ͜ͱΛߟ͑Δɻ y(x, w) = w0 + M−1 ∑ j=1 wj ϕj (x) (1.1) ·ͨࣜΛ୹ॖ͢ΔͨΊɺϕ0 (x) = 1 ͱ͠ɺ ϕ(x) = (ϕ0 (x), ϕ1 (x), · · · , ϕM−1 (x))T ͱఆٛ͢Δͱɺ(1.1) ͸ y(x, w) = M−1 ∑ j=0 wj ϕj (x) = wTϕ(x) (1.2) ͱॻ͚Δɻ 4 / 74
  5. 1-1. ઢܗճؼϞσϧ ͜͜Ͱڭࢣσʔλͱͯ͠ɺೖྗσʔλͷू߹ X = {x1 , x2 , ·

    · · , xN } ͱ ͦΕͧΕʹରԠ͢Δ໨ඪม਺ͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙ͠ɺҎԼͷ ଛࣦؔ਺Λߟ͑Δɻ ED (w) = 1 2 N ∑ n=1 (tn − y(xn , w))2 (1.3) ͜ͷଛࣦؔ਺Λ࠷খʹ͢ΔΑ͏ͳύϥϝʔλ w ΛٻΊΔɻ(ֶश) ED (w) ͷ w ʹର͢Δޯ഑͸ɺy(x, w) = wTϕ(x) ΑΓҎԼͷΑ͏ʹ ͳΔɻ ∂ ∂w ED (w) = 1 2 N ∑ n=1 ∂ ∂w (tn − wTϕ(xn ))2 = − N ∑ n=1 (tn − wTϕ(xn ))ϕ(xn ) = − { N ∑ n=1 tn ϕ(xn ) − N ∑ n=1 ϕ(xn )ϕ(xn )Tw } (1.4) 5 / 74
  6. 1-1. ઢܗճؼϞσϧ ͜ΕΑΓɺ࠷໬ਪఆղ wML ͸ҎԼͷࣜΛຬͨ͢ɻ N ∑ n=1 tn ϕ(xn

    ) − N ∑ n=1 ϕ(xn )ϕ(xn )TwML = 0 (1.5) ͜͜ͰɺҎԼͷܭըߦྻ Φ Λఆٛ͢Δɻ(ޙͷষͰ΋ग़͖ͯ·͢ɻ) Φ =       ϕ0 (x1 ) ϕ1 (x1 ) · · · ϕM−1 (x1 ) ϕ0 (x2 ) ϕ1 (x2 ) · · · ϕM−1 (x2 ) . . . . . . ... . . . ϕ0 (xN ) ϕ1 (xN ) · · · ϕM−1 (xN )       =       ϕ(x1 )T ϕ(x2 )T . . . ϕ(xN )T       (1.6) ҎԼͷ͕ࣜ੒Γཱͭࣄ͕Θ͔Δɻ ΦTΦ = N ∑ n=1 ϕ(xn )ϕ(xn )T (1.7) ΦTt = N ∑ n=1 tn ϕ(xn ) (1.8) 6 / 74
  7. 1-1. ઢܗճؼϞσϧ ͜ΕΑΓɺ(1.5) ͸ҎԼͷΑ͏ʹͳΔɻ ΦTt − ΦTΦwML = 0 (1.9)

    Αͬͯɺ࠷໬ਪఆղ wML ͸ wML = (ΦTΦ)−1ΦTt (1.10) ͱͳΔɻ ͜ͷ࠷໬ਪఆղΛ༻͍ͯɺະ஌ͷೖྗ ˜ x ͕༩͑ΒΕͨͱ͖ͷग़ྗͷ༧ ଌ͸ y(˜ x, wML ) Ͱ༩͑ΒΕΔɻ 7 / 74
  8. 1-1. ઢܗճؼϞσϧ ޡࠩؔ਺ (1.3) ͰֶशΛߦ͏ͱɺ͠͹͠͹աֶश (ڭࢣσʔλʹରͯ͠ ͸ྑ͍ਫ਼౓Λग़͕͢ɺςετσʔλʹରͯ͠͸ѱ͍ਫ਼౓Λग़ͯ͠͠·͏ ঢ়ଶ) Λىͯ͜͠͠·͏͜ͱ͕͋Δɻ աֶश͕ى͖͍ͯΔ࣌͸ɺύϥϝʔλ

    wML ͷ੒෼ͷઈର஋͕େ͖͘ͳ Δ܏޲͕͋ΔͨΊɺҎԼͷΑ͏ͳޡࠩؔ਺Λߟ͑Δɻ ED (w) = 1 2 N ∑ n=1 (tn − y(xn , w))2 + λ 2 ∥w∥2 (1.11) ͜͜ͰɺϊϧϜ ∥w∥2 = wTw = w2 0 + w2 1 + · · · w2 M ɺλ ͸ਖ਼ͷύϥϝʔ λɻ(ਖ਼ଇԽ߲ͱೋ৐ޡࠩͷ࿨ͷ߲ͷ૬ରతͳॏཁ౓Λௐઅ) ͜ͷޡࠩؔ਺Λ༻͍ΔͱɺաֶशΛ཈੍͢Δ͜ͱ͕Ͱ͖Δ͜ͱ͕͋Δɻ ͜ͷ࣌ͷύϥϝʔλͷ࠷໬ਪఆղ wML (ޙ΄Ͳ࠶ͼग़͖ͯ·͢ɻ) ͸ wML = ( λIM + ΦTΦ )−1 ΦTt (1.12) ͱͳΔɻ(PRML 3.1.4 ࢀর) ͜͜ͰɺIM ͸ M × M ͷ୯ҐߦྻͰ͋Δɻ 8 / 74
  9. 1-2. ఺ͱ௒ฏ໘ͷڑ཭ ͋ͱͰઆ໌͢Δ SVM Ͱ͸ɺσʔλ఺ͱΫϥεͷڥք໘ (Ұൠతʹ௒ฏ ໘) ͷڑ཭Λ࢖ͬͯٞ࿦͢ΔͷͰɺ͜͜Ͱ఺ͱ௒ฏ໘ͷڑ཭ʹ͍ͭͯٞ ࿦͢Δɻ(PRML 4.1.1)

    ·ͣҎԼͷઢܗؔ਺Λߟ͑Δɻ y(x) = wTx + w0 (1.13) ͜͜Ͱɺw ͱ x ͸ڞʹ D ࣍ݩϕΫτϧͱ͢Δɻ ෼ྨ໰୊Ͱ͸Α͘ɺy(x) ≥ 0 ͳΔೖྗ x ͸Ϋϥε C1 ʹׂΓ౰ͯΒΕɺ ͦΕҎ֎͸Ϋϥε C2 ʹׂΓ౰ͯΔɺΈ͍ͨͳ࢖͍ํΛ͢Δɻ Αͬͯɺy(x) = 0 (D − 1 ࣍ݩ௒ฏ໘) ͸Ϋϥεͷڥք (ܾఆ໘) Λද͢ɻ 9 / 74
  10. 1-2. ఺ͱ௒ฏ໘ͷڑ཭ D = 2 ͷ৔߹ͷܾఆ໘ (ܾఆઢ) ͸্ਤɻ 10 /

    74
  11. 1-2. ఺ͱ௒ฏ໘ͷڑ཭ ·ͣ͸ɺܾఆ໘্ͷҟͳΔ 2 ఺ xA ͱ xB Λߟ͑Δͱɺ͜ΕΒͷ఺͸ܾ ఆ໘্ʹ͋ΔͷͰҎԼ͕੒Γཱͭɻ

    y(xA ) = wTxA + w0 = 0 (1.14) y(xB ) = wTxB + w0 = 0 (1.15) ͜ΕΒͷࣜΛҾ͖ࢉ͢ΔͱɺwT(xA − xB ) = 0 ͱͳΔɻ ϕΫτϧ xA − xB ͸ܾఆ໘ʹฏߦͳϕΫτϧͳͷͰɺw ͸ܾఆ໘ʹਨ ௚ͳϕΫτϧͰ͋Δ͜ͱ͕Θ͔Δɻ 11 / 74
  12. 1-2. ఺ͱ௒ฏ໘ͷڑ཭ ͦΕͰ͸ɺ͜ͷΫϥεͷڥք y(x) = 0 ͱ఺ x ͷ௚ߦڑ཭ |r|

    ΛٻΊΔɻ ͦ͜Ͱɺx ΛҎԼͷΑ͏ʹɺܾఆ໘ʹਨ௚ͳํ޲ͱͦΕҎ֎ͷํ޲ x⊥ ʹ෼ղ͢Δɻ x = x⊥ + r w ∥w∥ (1.16) ͜͜Ͱɺw/∥w∥ ͸ܾఆ໘ʹਨ௚ͳ୯ҐϕΫτϧͰ͋Γɺx⊥ ͸ܾఆ໘ ্ͷ఺ (ϕΫτϧ) ʹͱΔɻ 12 / 74
  13. 1-2. ఺ͱ௒ฏ໘ͷڑ཭ (1.16) ͷ྆ลʹ wT Λ͔͚ͯɺw0 Λ଍͢ͱҎԼͷΑ͏ʹͳΔɻ((1.13) ΋༻͍ͨ) wTx +

    w0 = wTx⊥ + w0 + r∥w∥ →y(x) = y(x⊥ ) + r∥w∥ (1.17) x⊥ ͸ܾఆ໘্ͷ఺ͳͷͰɺy(x⊥ ) = 0 Λຬͨ͢ͷͰɺ௚ߦڑ཭ |r| ͸Ҏ ԼͷΑ͏ʹٻΊΒΕΔɻ |r| = |y(x)| ∥w∥ (1.18) ޙͷষͰ݁Ռ (1.18) Λ࢖༻ͯٞ͠࿦Λߦ͏ɻ 13 / 74
  14. 2. Χʔωϧ๏ 1-1 Ͱ͸ɺग़ྗ y(x, w) ͕ҎԼͷΑ͏ʹͳΔύϥϝτϦοΫϞσϧΛߟ ͑ͨɻ y(x, w)

    = M−1 ∑ j=0 wj ϕj (x) = wTϕ(x) (2.1) ͜͜Ͱɺx = (x1 , x2 , · · · , xD )T ͸ D ࣍ݩͷೖྗϕΫτϧɻ ϕ(x) = (ϕ0 (x), ϕ1 (x), · · · , ϕM−1 (x))T ͸ೖྗϕΫτϧ x Λ M ࣍ݩͷ ಛ௃ۭؒʹࣸ૾͢ΔϕΫτϧؔ਺ɻ·ͨɺw = (w0 , w1 , · · · , wM−1 )T ͸ M ࣍ݩͷύϥϝʔλϕΫτϧͰ͋Δɻ 1-1 Ͱ͸ɺೖྗσʔλͷू߹ {x1 , x2 , · · · , xN } ͦΕͧΕʹରԠ͢Δ໨ඪ ม਺ͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙͯ͠ɺ࠷খೋ৐๏Λ༻͍ͯɺxn Λೖ ྗͨ࣌͠ͷग़ྗ y(xn , w) ͕ tn Λ࠶ݱ͢ΔΑ͏ʹ࠷໬ਪఆղ wML Λٻ Ίͨɻ 14 / 74
  15. 2. Χʔωϧ๏ ͭ·Γ 1-1 Ͱ͸ɺग़ྗ y(x, w) ΛϕΫτϧؔ਺ ϕ(x) Λ࢖ͬͯߏ੒͢Δ

    ͜ͱ͕Ϟσϧߏஙͷग़ൃ఺Ͱ͋ͬͨɻ(ͪͳΈʹ PRML 5 ষͷχϡʔϥ ϧωοτͰ͸ɺϕ(x) ࣗମ΋ֶशύϥϝʔλʹґଘͤ͞Δͱ͜Ζ͔Βग़ ൃ͢Δ) ଟ͘ͷઢܕύϥϝτϦοΫϞσϧͰ͸ɺϞσϧΛ૒ରදݱͰॻ͖௚͢͜ ͱʹΑΓɺΧʔωϧؔ਺ k(x, x′) = ϕ(x)Tϕ(x′) (2.2) Λ௨ͯ͠ͷΈ ϕ(x) ͕࠷໬ਪఆղ wML ΍ͦͷύϥϝʔλΛ༻͍ͨग़ྗ y(x, wML ) ΁ґଘ͢ΔΑ͏ʹॻ͖௚ͤΔɻ(2-1 Ͱৄ͘͠ղઆ͢Δ) ·ͨɺճؼͱ෼ྨͷઢܕύϥϝτϦοΫϞσϧ (1-1 Ͱ͸ɺճؼͷઢܕ ύϥϝτϦοΫϞσϧΛऔΓѻͬͨ) Λ֬཰తʹऔΓѻ͏͜ͱʹΑͬ ͯɺ͜ΕΒͷϞσϧ͕Ψ΢εաఔͷҰྫʹͳ͍ͬͯΔ͜ͱΛΈΔɻ (2-3, 2-4 Ͱৄ͘͠ղઆ͢Δ) 15 / 74
  16. 2-1. ૒ରදݱ ग़ྗ y(x, w) ͕ҎԼͷΑ͏ʹͳΔύϥϝτϦοΫϞσϧΛߟ͑Δɻ y(x, w) = wTϕ(x)

    (2.3) ҎԼͷਖ਼ଇԽ͞Εͨೋ৐࿨ޡࠩΛ࠷খԽ͢Δ͜ͱΛߟ͑Δɻ(λ ͸ਖ਼ͷ ύϥϝʔλ) J(w) = 1 2 N ∑ n=1 {wTϕ(xn ) − tn }2 + λ 2 wTw (2.4) ͜͜Ͱɺೖྗσʔλͷू߹Λ {x1 , x2 , · · · , xN }ɺ໨ඪม਺ͷू߹Λ {t1 , t2 , · · · , tN } ͱ͢Δɻ 16 / 74
  17. 2-1. ૒ରදݱ ఀཹ఺৚݅ ∂J(w)/∂w = 0 ͸ҎԼͷΑ͏ʹมܗͰ͖Δɻ((1.4) Λࢀর) w =

    N ∑ n=1 an ϕ(xn ) = ΦTa (2.5) ͜͜Ͱɺ an = − 1 λ {wTϕ(xn ) − tn } (2.6) Ͱ͋Γɺa = (a1 , · · · , aN )T ͱ͠ɺΦ = (ϕ(x1 ), · · · , ϕ(xN ))T ͸ܭըߦ ྻ (1.6) Ͱ͋Δɻ w = ΦTa Λ༻͍ͯɺJ(w) ΛύϥϝʔλϕΫτϧ a ͷؔ਺ʹॻ͖௚͢ ͱҎԼͷΑ͏ʹͳΔɻ(ม਺ม׵) J(a) = 1 2 aTΦΦTΦΦTa − aTΦΦTt + 1 2 tTt + λ 2 aTΦΦTa (2.7) ͜͜Ͱɺt = (t1 , · · · , tN )T Ͱ͋Δɻ 17 / 74
  18. 2-1. ૒ରදݱ ͜͜ͰɺάϥϜߦྻ K = ΦΦT Λఆٛ͢Δɻ͜ͷߦྻͷ੒෼ Knm ͸ఆ ٛΑΓɺҎԼͷΑ͏ʹΧʔωϧͰॻ͚Δɻ

    Knm = ϕ(xn )Tϕ(xm ) = k(xn , xm ) (2.8) άϥϜߦྻ K Λ༻͍Δͱɺ(2.7) ͷ J(a) ͸ҎԼͷΑ͏ʹॻ͚Δɻ J(a) = 1 2 aTKKa − aTKt + 1 2 tTt + λ 2 aTKa (2.9) ͜ͷΑ͏ʹύϥϝʔλ w ͷ୅ΘΓʹύϥϝʔλ a Ͱ࠷খೋ৐๏ͷΞϧ ΰϦζϜΛදݱ͢Δ͜ͱ͕Ͱ͖ɺ͜ͷදݱΛ૒ରදݱͱݴ͏ɻ ૒ରදݱͰॻ͖௚͢ͱɺJ(a) ͷ ϕ(x) ґଘ͸Χʔωϧ (2.8) Λ௨ͯ͠ͷ Έґଘ͍ͯ͠Δ͜ͱ͕Θ͔Δɻ(ੜͷ ϕ(x) ґଘ͸ͳ͍) 18 / 74
  19. 2-1. ૒ରදݱ ͜ͷ J(a) Λ࠷খʹ͢Δ a ΛٻΊΔ (ޯ഑͕θϩʹͳΔΑ͏ͳ a ΛٻΊ

    Δ) ͱɺҎԼͷΑ͏ʹͳΔɻ a = (K + λIN )−1t (2.10) ͜͜ͰɺIN ͸ N × N ͷ୯ҐߦྻͰ͋Δɻ ͜ͷղ a ͱ w = ΦTa ͱ y(x, w) = wTϕ(x) Λ༻͍Δͱɺ৽͍͠ೖྗ x ʹର͢Δ༧ଌ y(x) ͸ҎԼͷΑ͏ʹͳΔɻ y(x) = aTΦϕ(x) = k(x)T(K + λIN )−1t (2.11) ͜͜Ͱɺk(x) = (k(x1 , x), k(x2 , x), · · · , k(xN , x))T Ͱ͋Δɻ ͜ΕΑΓɺ༧ଌ y(x) ΋Χʔωϧؔ਺ͷΈʹΑͬͯද͞Ε͍ͯΔɻ 19 / 74
  20. 2-1. ૒ରදݱ ૒ରදݱͰղ a ΛٻΊΔࡍɺ(2.10) ΑΓ N × N ͷߦྻͷٯߦྻΛٻΊ

    Δඞཁ͕͋Δɻ(N ͸ڭࢣσʔλͷ਺ɻ) Ұํɺओදݱ (ࠓͰ͸ύϥϝʔλ w Ͱͷදݱͷํ) Ͱͷղ w ͸ɺ(1.12) ΑΓ w = ( λIM + ΦTΦ )−1 ΦTt (2.12) ͳͷͰɺM × M ͷߦྻͷٯߦྻΛٻΊΔඞཁ͕͋Δɻ(M ͸ಛ௃ྔۭ ؒͷ࣍ݩɻ) N ≫ M ͷ࣌ (͜ͷΑ͏ͳ৔߹͕େଟ਺)ɺओදݱͰղΛٻΊΔํָ͕ɻ Ұํɺ૒ରදݱͰ͸ M ͕ແݶେͷ࣌ͷಛ௃ۭؒ΋औΓѻ͏͜ͱ͕Ͱ͖ Δɻ(2-2 Ͱ M ͕ແݶେͷ࣌ͷಛ௃ۭؒͷྫΛڍ͛Δɻ) 20 / 74
  21. 2-2. Χʔωϧؔ਺ͷߏ੒ ͜ͷઅͰ͸ɺΧʔωϧؔ਺ͷఆٛΛ༻͍ͯɺ͍ΖΜͳΧʔωϧؔ਺Λ঺ հ͢Δɻ Χʔωϧؔ਺ͷఆٛ͸ɺೖྗ x ͔Βద੾ͳ M ࣍ݩಛ௃ۭؒ΁ͷࣸ૾ ϕ

    ͕ఆٛͰ͖ɺk(x, x′) ͕ k(x, x′) = ϕ(x)Tϕ(x′) (2.13) ͱॻ͚Δ͜ͱͰ͋Δɻ Χʔωϧؔ਺ͷ؆୯ͳྫ͸ k(x, z) = (xTz)2 (2.14) Ͱ͋Δɻ 21 / 74
  22. 2-2. Χʔωϧؔ਺ͷߏ੒ ྫ͑͹ x = (x1 , x2 )T ͱ͠ɺࣸ૾

    ϕ Λ ϕ(x) = (x2 1 , √ 2x1 x2 , x2 2 )T ͱ͢ Δͱɺ k(x, z) = (xTz)2 = ϕ(x)Tϕ(z) (2.15) ͱॻ͚ΔͷͰɺk(x, z) = (xTz)2 ͸Χʔωϧؔ਺Ͱ͋Δɻ ࣮͸Χʔωϧؔ਺ͷఆٛ͸ (2.13) ͷଞʹ΋͏Ұͭ͋Γɺ੒෼͕ Knm = k(xn , xm ) Ͱ͋ΔάϥϜߦྻ K ͕൒ਖ਼ఆஔߦྻͰ͋Δ͜ͱͰ ͋Δɻ ɹ (͜ΕΒͷ 2 ͭͷఆ͕ٛ౳ՁͰ͋Δ͜ͱ͸ҎԼͷهࣄͰূ໌ͯ͠Έ·͠ ͨɻ͚ٓ͠Ε͹ɺ͝ཡ and άου͍ͩ͘͞) https://qiita.com/gucchi0403/items/544065345f91144524c4 22 / 74
  23. 2-2. Χʔωϧؔ਺ͷߏ੒ ࣍ʹɺطʹΧʔωϧؔ਺ͩͱΘ͔͍ͬͯΔؔ਺͔Βɺ৽͍͠Χʔωϧؔ ਺ k(x, x′) Λੜ੒͢Δํ๏ΛҎԼʹࣔ͢ɻ ͜͜Ͱɺؔ਺ k1 (·,

    ·), k2 (·, ·) ͸Χʔωϧؔ਺ɺc > 0 ͸ఆ਺ɺf(·) ͸೚ҙ ͷؔ਺ɺq(·) ͸ඇෛͷ܎਺Λ࣋ͭଟ߲ࣜɺϕ(·) ͸ M ࣍ݩϕΫτϧؔ਺ɺ k3 (·, ·) ͸ M ࣍ݩϕΫτϧ্ۭؒʹఆٛ͞ΕͨΧʔωϧؔ਺ɺA ͸ରশ ͳ൒ਖ਼ఆஔߦྻɺx = (xa , xb )ɺka (·, ·), kb (·, ·) ͸Χʔωϧؔ਺Ͱ͋Δɻ 23 / 74
  24. 2-2. Χʔωϧؔ਺ͷߏ੒ ͜ΕΒͷߏ੒๏Λ༻͍Δͱɺྫ͑͹ҎԼͷؔ਺͕ΧʔωϧͰ͋Δ͜ͱ͕ Θ͔Δɻ k(x, x′) = (xTx′ + c)M

    (2.16) ͜͜Ͱɺc ≥ 0 ͷఆ਺ɺM ͸೚ҙͷࣗવ਺ɻ ·ͨɺҎԼͷඇৗʹॏཁͳΨ΢εΧʔωϧͱݴ͏Χʔωϧؔ਺Λߏ੒Ͱ ͖Δɻ k(x, x′) = exp (−∥x − x′∥/2σ2) (2.17) ͜͜Ͱɺσ2 ͸೚ҙͷਖ਼ͷఆ਺ɻ ͪͳΈʹΨ΢εΧʔωϧʹରԠ͢Δಛ௃ϕΫτϧ͸ແݶ࣍ݩͰ͋Δɻ (→PRML ͷԋश໰୊ 6.11) 24 / 74
  25. 2-3. Ψ΢εաఔʹΑΔճؼ 2-1 Ͱ͸ɺઢܗճؼͷඇ֬཰తͳϞσϧ (ग़ྗ y(x, w) Λͦͷ··༧ଌ ʹ࢖༻) ʹ͍ͭͯɺ૒ରදݱͰॻ͖௚͢͜ͱͰΧʔωϧ͕ग़ݱ͢Δ͜ͱ

    Λݟͨɻ ɹ ࠓ౓͸ઢܗճؼͷ֬཰Ϟσϧ (༧ଌ t ͷ֬཰෼෍Λಋग़͢Δ) Λѻ͍ɺ ͜͜Ͱ΋ࣗવʹΧʔωϧ͕ग़ͯ͘Δ͜ͱΛ֬ೝ͢Δɻ 25 / 74
  26. 2-3. Ψ΢εաఔʹΑΔճؼ 2-1 ͱಉ༷ʹҎԼͷΑ͏ͳೖྗ x ͕༩͑ΒΕͨ࣌ͷग़ྗ͔Β࢝ΊΔɻ y(x, w) = wTϕ(x)

    (2.18) ࣍ʹɺϕΠζతͳΞϓϩʔνΛߦ͍͍ͨͷͰɺύϥϝʔλϕΫτϧ w ͷࣄલ෼෍ p(w) = N(w|0, α−1I) (2.19) ΛԾఆ͢Δɻ ͜͜Ͱɺp(w) ͔Β w ͕༩͑ΒΕͨͱ͠ɺ͞Βʹσʔλ఺ x ͕༩͑ΒΕ Δͱɺ(2.18) ΑΓ y(x) ͷ஋͕ܾ·Δɻ ͭ·Γɺw ͷ֬཰෼෍͸ x ͕༩͑ΒΕͨ࣌ͷ y(x) ͷ֬཰෼෍Λಋ͘ɻ ࣮༻తʹ͸ɺೖྗσʔλͷू߹ {x1 , x2 , · · · , xN } ͕༩͑ΒΕ͍ͯΔ࣌ ͷग़ྗؔ਺ͷಉ࣌෼෍ؔ਺ p(y(x1 ), y(x2 ), · · · , y(xN )) ͕ w ͷ֬཰෼෍ ͱ (2.18) ʹΑΓಋ͔ΕΔɻ(ΑΓਖ਼֬ʹݴ͏ͱɺX = {x1 , x2 , · · · , xN } ͱͯ͠ɺp(y(x1 ), y(x2 ), · · · , y(xN )|X)) 26 / 74
  27. 2-3. Ψ΢εաఔʹΑΔճؼ ͦ͜Ͱɺ֬཰ม਺ y = (y(x1 ), y(x2 ), ·

    · · , y(xN ))T ͱఆٛ͢Δͱɺ (2.18) ΑΓ y = Φw (2.20) ͕Θ͔Δɻ(Φ ͸ܭըߦྻ (1.6)) ͜ͷ࣌ɺy ͸Ψ΢ε෼෍ (2.19) ʹै͏ w ͷઢܗม׵ΑΓɺy ΋Ψ΢ε ෼෍ʹै͏ɻ Αͬͯɺy ͷ෼෍Λ׬શʹܾఆ͢ΔͨΊʹ͸ฏۉͱڞ෼ࢄߦྻ͕Θ͔Ε ͹Α͘ɺ E[y] = ΦE[w] = 0 (2.21) cov[y] = E[yyT] = ΦE[wwT]ΦT = 1 α ΦΦT = K (2.22) ͱΘ͔Δɻ ͜͜ͰɺK ͸ҎԼͷΑ͏ʹ੒෼ʹΧʔωϧؔ਺Λ΋ͭάϥϜߦྻͰ͋ Δɻ((2.13) ͷఆٛͱ͸ఆ਺ഒҟͳΔ) Knm = k(xn , xm ) = 1 α ϕ(xn )Tϕ(xm ) (2.23) 27 / 74
  28. 2-3. Ψ΢εաఔʹΑΔճؼ Ҏ্Ͱઆ໌ͨ͠ઢܗճؼ͸Ψ΢εաఔͷҰྫͱͳ͍ͬͯΔɻ Ψ΢εաఔͱ͸ɺೖྗσʔλͷू߹ {x1 , x2 , · ·

    · , xN } ͕༩͑ΒΕ͍ͯ Δ࣌ͷग़ྗؔ਺ͷಉ࣌෼෍ؔ਺ p(y) = p(y(x1 ), y(x2 ), · · · , y(xN )) ͕Ψ ΢ε෼෍ʹै͏ͱԾఆ͢Δ΋ͷͰ͋Δɻ ͦͷฏۉ͸θϩͱԾఆ͢Δ͜ͱ͕ଟ͘ɺ·ͨڞ෼ࢄ͸ҎԼͷΑ͏ʹΧʔ ωϧͱ͢Δɻ E[y(xn ), y(xm )] = k(xn , xm ) (2.24) ্Ͱઆ໌ͨ͠ઢܗճؼ͸͔֬ʹΨ΢εաఔͷҰྫͱͳ͍ͬͯΔ͜ͱ͕ Θ͔Δɻ 28 / 74
  29. 2-3. Ψ΢εաఔʹΑΔճؼ ͜͜Ͱ͸ɺΨ΢εաఔΛઢܗճؼʹద༻͢Δɻ ໨ඪม਺ tn ͸ग़ྗؔ਺ yn = y(xn )

    Λฏۉͱͨ͠Ψ΢ε෼෍ʹै͏ͱ ͢Δɻ p(tn |yn ) = N(tn |yn , β−1) (2.25) β ͸ਫ਼౓ͷϋΠύʔύϥϝʔλɻ ಠཱੑʹΑΓɺy = (y(x1 ), y(x2 ), · · · , y(xN ))T ͕༩͑ΒΕͨ࣌ͷ t = (t1 , · · · , tN )T ͷ༧ଌ෼෍͸ҎԼͷΑ͏ʹͳΔɻ p(t|y) = N(t|y, β−1IN ) (2.26) ·ͨΨ΢εաఔʹΑΓɺपล෼෍ p(y) ͸ฏۉ͕ 0 Ͱڞ෼ࢄ͕άϥϜߦ ྻ K Ͱ͋ΔΨ΢ε෼෍ʹै͏ͱ͢Δɻ p(y) = N(y|0, K) (2.27) 29 / 74
  30. 2-3. Ψ΢εաఔʹΑΔճؼ (2.26) ͷ p(t|y) ͱ (2.27) ͷ p(y) Λ༻͍Δͱɺ{x1

    , x2 , · · · , xN } ͕༩͑ ΒΕ͍ͯΔ࣌ͷ໨తม਺ t ͷ෼෍ p(t) ͸ҎԼͷΑ͏ʹܭࢉͰ͖Δɻ p(t) = ∫ p(t|y) p(y) dy = N(t|0, C) (2.28) ͜͜Ͱɺڞ෼ࢄ C ͷ੒෼ Cnm ͸ Cnm = k(xn , xm ) + β−1δnm (2.29) Ͱ͋Δɻ(PRML ຊจͷࣜ (2.113)ʙࣜ (2.115) Λ࢖༻ͨ͠ɻ) ڞ෼ࢄ C ʹग़ͯ͘ΔΧʔωϧؔ਺ͱͯ͠Α͘࢖༻͞ΕΔͷ͕ɺҎԼͷ Α͏ͳΧʔωϧͰ͋Δɻ k(xn , xm ) = θ0 exp { − θ1 2 ∥xn − xm ∥2 } + θ2 + θ3 xT n xm (2.30) θ0 , · · · , θ3 ͸ϋΠύʔύϥϝʔλɻ 30 / 74
  31. 2-3. Ψ΢εաఔʹΑΔճؼ զʑ͕஌Γ͍ͨͷ͸ɺ܇࿅σʔλͰ͋Δ {x1 , x2 , · · ·

    , xN } ͱ {t1 , t2 , · · · , tN } Λ࢖༻ͯ͠ɺະ஌ͷೖྗσʔλ xN+1 ͕༩͑ΒΕͨ࣌ͷ ໨ඪม਺ tN+1 ͷ෼෍Ͱ͋Δɻͭ·ΓɺtN = (t1 , · · · , tN )T ͱఆٛͨ͠ ࣌ͷ p(tN+1 |tN ) Ͱ͋Δɻ(͜͜Ͱɺೖྗม਺ͷґଘੑ͸লུͨ͠ɻ) p(tN+1 |tN ) ΛٻΊΔͨΊʹɺ·ͣ͸पล֬཰ p(tN+1 ) ͔ΒٻΊΔɻ͜ ͜ͰɺtN+1 = (t1 , · · · , tN+1 )T Ͱ͋Δɻ (2.28) ͷ݁ՌΛར༻͢Δͱɺp(tN+1 ) ͸ p(tN+1 ) = N(tN+1 |0, CN+1 ) (2.31) ͱͳΔɻ 31 / 74
  32. 2-3. Ψ΢εաఔʹΑΔճؼ ͜͜Ͱɺڞ෼ࢄߦྻ CN+1 ͸ CN+1 = ( CN k

    kT c ) (2.32) Ͱ͋Δɻ͜͜ͰɺCN ͸੒෼͕ (2.29) Ͱ͋ΔΑ͏ͳ N × N ͷߦྻͰɺ k = (k(x1 , xN+1 ), k(x2 , xN+1 ), · · · , k(xN , xN+1 ))T ͳΔϕΫτϧɺ c = k(xN+1 , xN+1 ) + β−1 Ͱ͋Δɻ ͜ͷ݁Ռͱ PRML ຊจͷࣜ (2.81) ͱࣜ (2.82) Λ༻͍Δͱɺp(tN+1 |tN ) ͸Ψ΢ε෼෍ʹै͍ɺͦͷฏۉ m(xN+1 ) ͱ෼ࢄ σ2(xN+1 ) ͸ҎԼͷΑ ͏ʹͳΔɻ m(xN+1 ) = kTC−1 N tN (2.33) σ2(xN+1 ) = c − kTC−1 N k (2.34) ͭ·Γɺະ஌ͷೖྗσʔλ xN+1 ͕༩͑ΒΕͨ࣌ͷ໨ඪม਺ tN+1 ͷ֬ ཰෼෍͸ฏۉͱ෼ࢄ͕ xN+1 ʹґଘ͢ΔΨ΢ε෼෍ͱͳΔɻ 32 / 74
  33. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ࠓ౓͸Ψ΢εաఔͰΫϥε෼ྨΛߦ͏ɻ ճؼͰ͸ɺ(2.27) ͷΑ͏ʹग़ྗؔ਺ͷಉ࣌෼෍ؔ਺ p(y) ͕Ψ΢ε෼෍ ʹै͏ͱԾఆͨ͠ɻ͜ͷ࣌ɺyn ͸࣮਺શମͷ஋ΛͱΔɻ ෼ྨͰ͸ɺग़ྗ͸

    yn ͸ 0 ≤ yn ≤ 1 ͱͳΔ΂͖Ͱ͋Δɻͦ͜Ͱɺग़ྗͰ ͸ͳ͘׆ੑ an = a(xn ) ͷಉ࣌෼෍ؔ਺Λߟ͑Δ͜ͱʹ͠ɺग़ྗΛ yn = σ(an ) ͱ͢Δɻ ·ͨɺ໨తม਺ tn = 1 ͷ࣌ͷ֬཰Λ p(tn = 1|an ) = σ(an ) ͱ͢Δͱɺ p(tn = 0|an ) = 1 − σ(an ) ΑΓɺ p(tn |an ) = σ(an )tn (1 − σ(an ))1−tn (2.35) ͱͳΔɻ ճؼͷ࣌ͱಉ༷ʹɺ܇࿅σʔλͰ͋Δ {x1 , x2 , · · · , xN } ͱ tN = (t1 , · · · , tN )T Λ࢖༻ͯ͠ɺະ஌ͷೖྗσʔλ xN+1 ͕༩͑ΒΕͨ ࣌ͷ໨ඪม਺ tN+1 ͷ෼෍ p(tN+1 |tN ) ΛٻΊΔɻ(͜͜Ͱ΋ೖྗม਺ͷ ґଘੑ͸লུͨ͠ɻ) 33 / 74
  34. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ·ͣɺaN+1 = (a(x1 ), a(x2 ), · ·

    · , a(xN+1 ))T ͱͯ͠ɺΨ΢εաఔΑΓ ׆ੑͷಉ࣌෼෍ p(aN+1 ) ΛҎԼͷΑ͏ʹԾఆ͢Δɻ(ճؼͰ͸ (2.27) ʹ ରԠ) p(aN+1 ) = N(aN+1 |0, CN+1 ) (2.36) ͜͜Ͱɺڞ෼ࢄߦྻ CN+1 ͷ੒෼͸ҎԼͱ͢Δɻ (CN+1 )nm = k(xn , xm ) + νδnm (2.37) ν ͸ϊΠζ߲Ͱ͋Δɻ ٻΊ͍ͨͷ͸ɺ໨ඪม਺ tN+1 ͷ෼෍ p(tN+1 |tN ) Ͱ͋Γɺ2 ஋෼ྨͰ͸ p(tN+1 = 0|tN ) = 1 − p(tN+1 = 1|tN ) ͳͷͰɺp(tN+1 = 1|tN ) ͷΈΛ ٻΊΕ͹ྑ͍ɻ 34 / 74
  35. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ͜͜Ͱɺ p(tN+1 = 1, tN ) = ∫

    p(tN+1 = 1, tN , aN+1 ) daN+1 = ∫ p(tN+1 = 1|tN , aN+1 )p(aN+1 |tN )p(tN ) daN+1 = ∫ p(tN+1 = 1|aN+1 )p(aN+1 |tN )p(tN ) daN+1 (2.38) ΑΓɺp(tN+1 = 1|tN ) ͸ҎԼͷΑ͏ʹܭࢉ͞ΕΔɻ p(tN+1 = 1|tN ) = ∫ p(tN+1 = 1|aN+1 )p(aN+1 |tN ) daN+1 (2.39) ͜͜Ͱɺp(tN+1 = 1|aN+1 ) = σ(aN+1 ) Ͱ͋Δɻ ͜ͷੵ෼͸ղੳతʹ࣮ߦ͢Δ͜ͱ͸ෆՄೳͰ͋Γɺ༷ʑͳํ๏Λ༻͍ͯ ۙࣅతʹٻΊΔ͜ͱ͕͞Ε͍ͯΔɻࠓճ͸ϥϓϥεۙࣅ (PRML 4.5.1) Λ༻͍Δɻ 35 / 74
  36. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ͜ͷઅͰ͸ɺϥϓϥεۙࣅΛ༻͍ͯੵ෼ (2.39) ΛධՁ͢Δɻ ·ͣɺp(aN+1 |tN ) ΛҎԼͷΑ͏ʹมܗ͢Δɻ p(aN+1

    |tN ) = ∫ p(aN+1 |aN )p(aN |tN ) daN (2.40) p(aN |tN ) ͸ࣄޙ෼෍Ͱ͋Δɻ ͜͜Ͱɺ৚݅෇͖෼෍ p(aN+1 |aN ) ͸ɺճؼͷ࣌ͷ (2.33) ͱ (2.34) ͷ p(tN+1 |tN ) ͷ݁ՌΛࢀߟʹ͢Δͱɺ p(aN+1 |aN ) = N(aN+1 |kTC−1 N aN , c − kTC−1 N k) (2.41) ͱͳΔɻ 36 / 74
  37. 2-4. Ψ΢εաఔʹΑΔ෼ྨ p(aN |tN ) Λۙࣅ͢Δ (ϥϓϥεۙࣅ)ɻ ͦͷͨΊʹ͸ɺ ∂p(aN |tN

    ) ∂aN = ∇p(aN |tN ) = 0 (2.42) Λຬͨ͢ aN (= a⋆ N ) ͱɺaN = a⋆ N Ͱͷϔοηߦྻ −∇∇ ln p(aN |tN ) ͕ ඞཁͰ͋Δɻ(PRML 4.5.1) ·ͣɺࣄલ෼෍ p(aN ) ͸ p(aN ) = N(aN |0, CN ) (2.43) Ͱ༩͑Δɻ͜Ε͸ (2.36) Ͱ N + 1 → N ͱͨ͠΋ͷɻ ໬౓ؔ਺ p(tN |aN ) ͸σʔλ఺ͷಠཱੑΑΓɺ p(tN |aN ) = N ∏ n=1 σ(an )tn (1 − σ(an ))1−tn = N ∏ n=1 eantn σ(−an ) (2.44) ͱͳΔɻ 37 / 74
  38. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ϕΠζͷఆཧΑΓɺp(aN |tN ) ∝ p(tN |aN )p(aN )

    ͳͷͰɺ࣮ࡍʹܭࢉΛ ͢Δͱɺ a⋆ N = CN (tN − σN ) (2.45) ͱͳΔɻ͜͜ͰɺσN = (σ(a1 ), σ(a2 ), · · · , σ(aN ))T Ͱ͋Δɻ ·ͨɺaN = a⋆ N Ͱͷϔοηߦྻ H ͸ H = W⋆ + C−1 N (2.46) ͱͳΔɻ͜͜ͰɺW ͸ σ(an )(1 − σ(an )) Λର֯੒෼ʹ࣋ͭର֯ߦྻͰ ͋ΓɺW⋆ ͸ aN = a⋆ N Ͱͷ W Ͱ͋Δɻ Αͬͯɺࣄޙ෼෍ p(aN |tN ) ͸ҎԼͷΑ͏ʹۙࣅ͞ΕΔɻ(ϥϓϥε ۙࣅ) p(aN |tN ) ∼ N(aN |a⋆ N , H−1) (2.47) 38 / 74
  39. 2-4. Ψ΢εաఔʹΑΔ෼ྨ (2.41) ͱ (2.47) ΑΓɺҎԼͷΑ͏ʹ (2.40) ͷੵ෼͕ۙࣅͰ͖Δɻ p(aN+1 |tN

    ) ∼ ∫ N(aN+1 |kTC−1 N aN , c−kTC−1 N k)N(aN |a⋆ N , H−1) daN (2.48) PRML ຊจͷࣜ (2.115) ΑΓɺp(aN+1 |tN ) ͸ҎԼͷฏۉͱ෼ࢄΛ࣋ͭ Ψ΢ε෼෍ͱͳΔɻ E[aN+1 |tN ] = kT(tN − σN ) (2.49) var[aN+1 |tN ] = c − kT(W−1 N + CN )−1k (2.50) ͜͜ͰɺWN ͸ (2.46) ͷ W⋆ Ͱ͋Δɻ 39 / 74
  40. 2-4. Ψ΢εաఔʹΑΔ෼ྨ ͜ΕΑΓɺ(2.39) ͷ p(tN+1 = 1|tN ) ͸ҎԼͷΑ͏ʹۙࣅͰ͖Δɻ (PRML

    ຊจͷࣜ (4.153) Λ࢖༻) p(tN+1 = 1|tN ) ∼ ∫ σ(aN+1 )N(aN+1 |kT(tN − σN ), c − kT(W−1 N + CN )−1k) daN+1 ∼ σ ( κ ([ c − kT(W−1 N + CN )−1k ]2 ) · kT(tN − σN ) ) (2.51) ͜͜Ͱɺ κ(a2) = (1 + πa2/8)−1/2 (2.52) Ͱ͋Δɻ 40 / 74
  41. 3. ૄͳղΛ࣋ͭΧʔωϧϚγʔϯ લষͰ͸ɺΧʔωϧΛग़ൃ఺ͱ༷ͨ͠ʑͳΞϧΰϦζϜΛ঺հͨ͠ɻ ͜ΕΒͷΞϧΰϦζϜͰ͸ɺΧʔωϧؔ਺ k(xn , xm ) Λ͢΂ͯͷڭࢣ σʔλͷରͰܭࢉ͠ͳ͍ͱ͍͚ͳ͍ɻ(ྫ͑͹ɺ(2.11))

    ͜ΕʹΑΓɺֶशͱ༧ଌ࣌ʹඇৗʹֻ͕͔࣌ؒΔɻ ͦ͜Ͱ͜ͷষͰ͸ɺڭࢣσʔλͷҰ෦͚ͩʹରͯ͠Χʔωϧؔ਺ k(xn , xm ) Λܭࢉ͢Ε͹ɺ৽͍͠ೖྗͷ༧ଌ͕Ͱ͖ΔΑ͏ͳϞσϧΛ঺ հ͢Δɻ ಛʹαϙʔτϕΫτϧϚγʔϯ (SVM) ʹ͍ͭͯৄ͘͠঺հ͢Δɻ ͜ͷ SVM ͸ࣝผؔ਺Λ༩͑Δ͚ͩͰɺ༧ଌͷ֬཰෼෍͸༩͑ͳ͍ɻ ͦ͜Ͱɺؔ࿈ϕΫτϧϚγʔϯ (RVM) Ͱ͸ɺ֬཰࿦Λ༻͍Δ͜ͱͰɺ ϕΠζਪ࿦ʹج͖ͮ t ͷ༧ଌ෼෍Λ༩͑Δ͜ͱ͕Ͱ͖Δɻ 41 / 74
  42. 3-1. ࠷େϚʔδϯ෼ྨث ·ͣ͸͡ΊʹɺҎԼͷઢܗϞσϧΛ༻͍ͯ 2 ஋෼ྨΛղ͘͜ͱ͔Β࢝ ΊΔɻ y(x, w) = wTϕ(x)

    + b (3.1) ͜͜Ͱɺw = (w1 , w2 , · · · , wM−1 )T ͸ύϥϝʔλϕΫτϧͰ͋Γɺ ϕ(x) = (ϕ1 (x), ϕ2 (x), · · · , ϕM−1 (x))T ͸ೖྗ x Λಛ௃ۭؒʹࣸ૾͢Δ ϕΫτϧؔ਺Ͱ͋Γɺb ͸όΠΞεύϥϝʔλͰ͋Δɻ ͜ͷޙ͙͢ʹΧʔωϧؔ਺Λಋೖ͠ɺಛ௃ۭؒΛཅʹѻΘͳͯ͘Α͘ ͳΔɻ ·ͨڭࢣσʔλʹ͍ͭͯ͸ɺೖྗσʔλͷू߹ {x1 , x2 , · · · , xN } ͦΕ ͧΕʹରԠ͢Δ໨ඪม਺ͷू߹ {t1 , t2 , · · · , tN } ͱ͢Δɻ 2 ஋෼ྨͳͷͰ໨ඪม਺͸཭ࢄతͳ஋ΛͱΓɺtn ∈ {−1, 1} ͱ͢Δɻ 42 / 74
  43. 3-1. ࠷େϚʔδϯ෼ྨث ·ͨ౰໘ͷؒ͸ɺڭࢣσʔλ͸ಛ௃ۭؒ (ϕ ۭؒ) Ͱઢܗ෼཭ՄೳͰ͋ Δ͜ͱΛԾఆ͢Δɻ ೖྗۭؒ (x ۭؒ)

    Ͱ͸ઢܗ෼཭ՄೳͰͳͯ͘΋ྑ͍ɻ ͭ·Γɺy(x, w) = 0 ͕ڭࢣσʔλΛಛ௃ۭؒͰઢܗ෼཭͢ΔΑ͏ͳ w ͱ b ͕গͳ͘ͱ΋Ұͭ͋Δ͸͋Δͱ͍͏͜ͱͰ͋Δɻ ͦͷΑ͏ͳ w ͱ b Ͱ͸ɺtn = +1 ͳΒ y(xn ) > 0 Ͱ͋Γɺtn = −1 ͳΒ y(xn ) < 0 Ͱ͋ΔͱԾఆ͢Δɻ ͜ΕΒͷ৚݅͸·ͱΊͯ tn y(xn ) > 0 ͱॻ͚Δ͜ͱʹ஫ҙɻ 43 / 74
  44. 3-1. ࠷େϚʔδϯ෼ྨث ڭࢣσʔλΛઢܗ෼཭͢ΔΑ͏ͳ w ͱ b ͸ෳ਺͋Δ͜ͱ͕͋Δɻ SVM Ͱ͸ɺͦͷෳ਺ͷղ͔ΒҰͭΛબͼग़࣌͢ʹɺϚʔδϯͱݺ͹Ε Δ֓೦Λಋೖ͢Δɻ

    Ϛʔδϯͱ͸ɺԼͷਤͷΑ͏ʹɺಛ௃ۭؒͷ෼ྨڥք (y(x) = 0 ͷઢ) ͱڭࢣσʔλͱͷ࠷୹ڑ཭Ͱ͋Δɻ ͜ͷϚʔδϯΛ࠷େʹ͢ΔΑ͏ͳ w ͱ b ΛબͿɻ(Լͷਤͷӈଆ) ෳ਺ͷղ͔ΒҰͭΛબͼग़࣌͢ʹϚʔδϯΛ࠷େʹ͢ΔղΛબͿཧ༝ ͸ PRML 7.1.5 Λࢀরɻ 44 / 74
  45. 3-1. ࠷େϚʔδϯ෼ྨث ͦΕͰ͸ɺڭࢣσʔλΛઢܗ෼཭͢ΔΑ͏ͳ w ͱ b ΛٻΊΔͨΊͷࣜ Λಋग़͢Δɻ ·ͣɺຊεϥΠυͷ 1-2

    ͷ (1.18) ΑΓɺ௒ฏ໘ y(x) = 0 ͱಛ௃্ۭؒ ͷ఺ ϕ(x) ͱͷڑ཭͸ |y(x)|/∥w∥ Ͱ༩͑ΒΕΔͷͰɺ֤ڭࢣσʔλͷ ఺ͱ௒ฏ໘ y(x) = 0 ͱͷڑ཭͸ |y(xn )|/∥w∥ Ͱ༩͑ΒΕΔɻ ࠓɺڭࢣσʔλΛઢܗ෼཭ՄೳͰ͋Δ͜ͱ͔Βɺ|y(xn )| = tn y(xn ) ͱ ͳΔɻ Αͬͯɺ(3.1) ΑΓ֤ڭࢣσʔλͷ఺ͱ௒ฏ໘ y(x) = 0 ͱͷڑ཭͸ tn y(xn ) ∥w∥ = tn (wTϕ(xn ) + b) ∥w∥ (3.2) ͱͳΔɻ 45 / 74
  46. 3-1. ࠷େϚʔδϯ෼ྨث Ϛʔδϯ͸௒ฏ໘ y(x) = 0 ͱڭࢣσʔλͷ࠷୹ڑ཭ min n {

    tn (wTϕ(xn ) + b) ∥w∥ } = 1 ∥w∥ min n [ tn (wTϕ(xn ) + b) ] (3.3) Ͱ͋Γɺ͜ͷϚʔδϯΛ࠷େʹ͢Δ w ͱ b ΛٻΊ͍ͨͷͰɺҎԼͷࣜ Λղ͘͜ͱʹͳΔɻ argmax w,b { 1 ∥w∥ min n [ tn (wTϕ(xn ) + b) ]} (3.4) ͜ΕΛ௚઀ղ͘ͷ͸೉͍͠ͷͰɺw → κw, b → κb ͱ͍͏ม׵Λͯ͠ ΋ɺ௒ฏ໘ y(x) = 0 ͱಛ௃্ۭؒͷ఺ ϕ(xn ) ͱͷڑ཭ |y(xn )|/∥w∥ = tn (wTϕ(xn ) + b)/∥w∥ ͸มԽ͠ͳ͍͜ͱʹ஫໨͢Δɻ 46 / 74
  47. 3-1. ࠷େϚʔδϯ෼ྨث ࠓɺ௒ฏ໘ y(x) = 0 ͱ࠷୹ڑ཭ʹ͋Δσʔλ఺Λ n ͱ͠ɺ tn

    (wTϕ(xn ) + b) = an (3.5) ͢Δɻ(an ≤ aj (j ̸= n)) ͦͯ͠ɺw → an w, b → an b ͱ͍͏ม׵Λ͢Δͱɺ tn (wTϕ(xn ) + b) = 1 (3.6) ͱͳΔɻ j ̸= n Ͱ͋Δ j Ͱ͸ɺw → an w, b → an b ͱ͍͏ม׵Ͱ tj (wTϕ(xj ) + b) = aj an ≥ 1 (3.7) ͱͳΔɻ 47 / 74
  48. 3-1. ࠷େϚʔδϯ෼ྨث ͜ΕΒΛ·ͱΊͯॻ͘ͱ ɺ tn (wTϕ(xn ) + b) ≥

    1 (n = 1, · · · , N) (3.8) ͱͳΔɻ ͜ͷม׵Λ͢Δͱɺղ͖͍ͨࣜ (3.4) ͸ argmax w,b [ 1 ∥w∥ ] = argmin w,b [ 1 2 ∥w∥2 ] (3.9) ͱͳΔɻ ͭ·Γɺ(3.8) ͷ৚݅ͷԼͰ argmin w,b [ 1 2 ∥w∥2 ] (3.10) Λղ͘໰୊ʹؼண͢Δɻ 48 / 74
  49. 3-1. ࠷େϚʔδϯ෼ྨث ͜ͷ࠷খԽ໰୊Λղͨ͘Ίʹ͸ҎԼͷϥάϥϯδϡؔ਺ͷ w, b, a ʹର ͢Δఀཹ఺ΛٻΊΕ͹ྑ͍͜ͱ͕Θ͔Δɻ(PRML ෇࿥ E

    ࢀর) L(w, b, a) = 1 2 ∥w∥2 − N ∑ n=1 an {tn (wTϕ(xn ) + b) − 1} (3.11) ͨͩ͠ɺҎԼͷ৚݅ (Karush-Kuhn-Tucker ৚݅) ͕෇͘ɻ an ≥0 (3.12) tn (wTϕ(xn ) + b) − 1 ≥0 (3.13) an {tn (wTϕ(xn ) + b) − 1} =0 (3.14) ͜ͷ࠷খԽ໰୊ʹ͍ͭͯɺQiita ͷهࣄʹ·ͱΊ·ͨ͠ɻ https://qiita.com/gucchi0403/items/3d5f27f8d3b2ff0e766d 49 / 74
  50. 3-1. ࠷େϚʔδϯ෼ྨث L(w, b, a) Λ w ͱ b Ͱඍ෼ͨࣜ͠Λθϩͱஔ͍ͨࣜ͸

    w = N ∑ n=1 an tn ϕ(xn ) (3.15) 0 = N ∑ n=1 an tn (3.16) ͱͳΓɺ͜ΕΛ༻͍Δͱɺ(3.11) ͷӈล͔Β w ͱ b Λফڈ͢Δ͜ͱ͕ Ͱ͖ɺϥάϥϯδϡؔ਺͸ ˜ L(a) = N ∑ n=1 an − 1 2 N ∑ n=1 N ∑ m=1 an am tn tm k(xn , xm ) (3.17) ͱͳΔɻ ͜͜Ͱɺk(xn , xm ) = ϕ(xn )Tϕ(xm ) ͸Χʔωϧؔ਺Ͱ͋Γɺϥάϥϯ δϡؔ਺ ˜ L(a) ͸Χʔωϧؔ਺Λ௨ͯ͠ͷΈ ϕ(x) ʹґଘ͢Δ͜ͱ͕Θ ͔Δɻ 50 / 74
  51. 3-1. ࠷େϚʔδϯ෼ྨث ͦΕͰ͸ɺ৽͍͠ೖྗ x ͕༩͑ΒΕͨ࣌ͷग़ྗ y(x) Λௐ΂Δɻ (3.1) ʹ (3.15)

    Λ୅ೖ͢Δͱ y(x) = N ∑ n=1 an tn k(x, xn ) + b (3.18) ͱͳΔɻ·ͨɺan ͷຬͨ͢΂͖৚݅͸ҎԼͰ͋Δɻ an ≥0 (3.19) tn y(xn ) − 1 ≥0 (3.20) an {tn y(xn ) − 1} =0 (3.21) ͜ΕΛΈΔͱɺ௒ฏ໘ y(x) = 0 ͱ࠷୹ڑ཭ʹ͋Δσʔλ఺ (tn y(xn ) − 1 = 0) Ҏ֎͸ an = 0 ͱͳΓɺ࠷୹ڑ཭ʹͳ͍σʔλ఺͸༧ ଌ (3.18) ʹඞཁͳ͍͜ͱ͕Θ͔Δɻ 51 / 74
  52. 3-1. ࠷େϚʔδϯ෼ྨث ·ͨɺ͜͜Ͱͷ࠷େϚʔδϯֶश͸ҎԼͷޡࠩؔ਺Λ࠷খԽ໰୊ͱͯ͠ දݱͰ͖Δɻ N ∑ n=1 E∞ (y(xn )tn

    − 1) + λ∥w∥2 (3.22) ͜͜ͰɺE∞ (z) ͸ z ≥ 0 ͷͱ͖ 0ɺͦΕҎ֎ͷͱ͖͸ ∞ ͱͳΔؔ਺Ͱ ͋Δɻ ͭ·Γɺσʔλ఺ͷू߹ͷதͰͻͱͭͰ΋ (3.8) Λຬͨ͞ͳ͍఺͕͋ͬ ͨΒ͜ͷޡࠩؔ਺͸ൃ͢ΔͷͰɺ࠷খԽ͢Δʹ͸͢΂ͯͷσʔλ͕ (3.8) Λຬͨ͢ඞཁ͕͋Δ͜ͱΛද͢ɻ 52 / 74
  53. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ લͷઅͰ͸ɺσʔλ఺͸ಛ௃ۭؒͰઢܗ෼཭Ͱ͖Δ͜ͱΛԾఆͨ͠ɻ (ೖྗۭؒͰ͸ඞͣ͠΋ઢܗ෼཭Ͱ͖Δͱ͸ݶΒͳ͍ɻ) ࠓճ͸σʔλ఺͕ಛ௃ۭؒͰઢܗ෼཭Ͱ͖ͳ͍ͱ͖ͷ͜ͱΛߟ͑Δɻ ͦ͜Ͱɺσʔλ఺͝ͱʹఆٛ͞ΕΔεϥοΫม਺ ξn (≥ 0) Λಋೖ͢Δɻ

    ͜ͷεϥοΫม਺Λ࢖ͬͯɺ৚݅ࣜ (3.8) ΛҎԼͷΑ͏ʹमਖ਼͢Δɻ tn y(xn ) ≥ 1 − ξn (n = 1, · · · , N) (3.23) ͜͜Ͱɺy(x) ͸ (3.1) Ͱ͋Δɻ ·ͨɺ࠷খԽ͢Δؔ਺͸ (3.10) ͷ ∥w∥2/2 ʹϖφϧςΟ߲ΛՃ͑ͨҎ Լͷؔ਺ͱ͢Δɻ C N ∑ n=1 ξn + 1 2 ∥w∥2 (3.24) ͜͜ͰɺC > 0 Ͱ͋Δɻ 53 / 74
  54. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ ৚݅ (3.23) ͱଛࣦؔ਺ (3.24) ͷҙຯΛߟ͑Δɻ ξn = 0

    ͱͳΔσʔλ͸ɺ(3.23) ΑΓ tn y(xn ) ≥ 1 Λຬͨ͢ͷͰɺਖ਼͘͠ ෼ྨ͞Ε͍ͯͯɺϚʔδϯͷ্ (tn y(xn ) = 1)ɺ΋͘͠͸ϚʔδϯΛ௒ ͑ͯਖ਼͍͠ଆʹଐ͢Δ (tn y(xn ) > 1) લઅͰ͸͢΂ͯͷσʔλ఺͕͜ͷΑ͏ʹਖ਼͘͠෼ྨͰ͖ΔԾఆΛͯ͠ ͍ͨɻ ·ͨɺ0 < ξn ≤ 1 ͸Ϛʔδϯ಺ʹ͋Δ͕ਖ਼͘͠෼ྨ͞Ε͍ͯΔσʔλ Λද͢ɻ ͦͯ͠ɺ1 < ξn ͸ޡ෼ྨ͞Ε͍ͯΔ఺Λද͢ɻ 54 / 74
  55. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ ͜ͷΑ͏ʹࠓճͷ৚݅ (3.23) ͸ (3.8) ͱ͸ҧ͍ɺϚʔδϯ಺ʹ͋Δਖ਼͠ ͘෼ྨ͞Ε͍ͯΔ఺΍ޡ෼ྨ͞Εͨ఺ͷଘࡏ΋ڐ͢ɻ ͨͩ͠ɺଛࣦؔ਺ (3.24)

    ͷҰ߲໨ΑΓɺͦΕΒͷ఺͕͋ͬͨΒଛࣦؔ ਺͕େ͖͘ͳͬͯ͠·͏ (ϖφϧςΟ) ͜ͱ͕Θ͔Δɻ զʑ͕ղ͖͍ͨ໰୊͸ෆ౳ࣜ (3.23) ͱ ξn ≥ 0 ͱ͍͏ 2 छྨͷෆ౳ࣜ৚ ݅ͷԼͰ (3.24) Λ࠷খʹ͢Δ͜ͱͳͷͰɺϥάϥϯδϡؔ਺ L(w, b, ξ, a, µ) ͸ҎԼͷΑ͏ʹ͢Δɻ(PRML ෇࿥ E ࢀর) L(w, b, ξ, a, µ) = 1 2 ∥w∥2 + C N ∑ n=1 ξn − N ∑ n=1 an {tn (wTϕ(xn ) + b) − 1 + ξn } − N ∑ n=1 µn ξn (3.25) 55 / 74
  56. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ ͦͯ͠ɺKKT ৚݅͸ҎԼͰ͋Δɻ an ≥ 0 (3.26) tn (wTϕ(xn

    ) + b) − 1 + ξn ≥ 0 (3.27) an (tn (wTϕ(xn ) + b) − 1 + ξn ) = 0 (3.28) µn ≥ 0 (3.29) ξn ≥ 0 (3.30) µn ξn = 0 (3.31) L(w, b, ξ, a, µ) Λ w, b, ξ Ͱඍ෼ͨࣜ͠Λθϩͱஔ͍ͨࣜ͸ҎԼͰ͋Δɻ w = N ∑ n=1 an tn ϕ(xn ) (3.32) 0 = N ∑ n=1 an tn (3.33) an =C − µn (3.34) 56 / 74
  57. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ (3.32)ʙ(3.34) Λ༻͍Δͱɺϥάϥϯδϡؔ਺ (3.25) ͸ ˜ L(a) = N

    ∑ n=1 an − 1 2 N ∑ n=1 N ∑ m=1 an am tn tm k(xn , xm ) (3.35) ͱͳΔɻ ͜ͷؔ਺ͷܗ͸ઢܗ෼཭ՄೳͳԾఆΛͨ͠ͱ͖ͷؔ਺ (3.11) ͱҰக ͢Δɻ ·ͨɺࠓճ৽ͨʹग़͖ͯͨ৚݅ࣜ (3.34) ͱɺ(3.26), (3.29) ΑΓ an ʹ͸ ҎԼͷ৚͕݅෇͘ɻ 0 ≤ an ≤ C (3.36) 57 / 74
  58. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ ͞Βʹ (3.32) Λ (3.1) ʹ୅ೖ͢Δͱ y(x) = N

    ∑ n=1 an tn k(x, xn ) + b (3.37) ͱͳΓɺઢܗ෼཭ՄೳͳԾఆΛͨ͠ͱ͖ࣜ (3.18) ͱಉࣜ͡Ͱ͋Δ͜ͱ ͕Θ͔Δɻ ·ͣ (3.37) ΑΓɺan = 0 ͱͳΔ఺͸৽ͨͳೖྗͷ༧ଌʹد༩͠ͳ͍ɻ Ұํɺan ̸= 0 ͱͳΔ఺͸৽ͨͳೖྗͷ༧ଌʹد༩͠ɺ (3.28) ΑΓ tn (wTϕ(xn ) + b) − 1 + ξn = 0 (3.38) Λຬͨ͢αϙʔτϕΫτϧͰ͋Δɻ ͜ΕΑΓɺࠓճͷ৔߹͸ઢܗ෼཭ՄೳͳԾఆΛͨ͠ͱ͖ͱ͸ҧ͍ɺξn ͷ஋ʹΑͬͯ͸ɺϚʔδϯ্ʹ৐͍ͬͯͳ͍఺Ͱ΋αϙʔτϕΫτϧʹ ͳΓ͏Δɻ 58 / 74
  59. 3-2. ॏͳΓ߹͏Ϋϥε͕͋Δ෼෍ ͨͱ͑͹ɺan ̸= 0 ͱͳΔ఺ͷ಺ɺ0 < an < C

    ͷͱ͖͸ɺ(3.34) ΑΓ µn > 0 ͱͳΓɺ (3.31) ΑΓ ξn = 0 ͱͳΔ఺Ͱ͋Δ͜ͱ͕Θ͔Δɻ ξn = 0 ͱͳΔ఺͸ tn (wTϕ(xn ) + b) = 1 Λຬͨ͢఺ͳͷͰɺϚʔδϯ ڥք্ʹଘࡏ͢Δ఺Ͱ͋Δɻ ·ͨɺan = C ͷͱ͖͸ɺ(3.34) ΑΓ µn = 0 ͱͳΓɺ (3.30), (3.31) Α Γ ξn ≥ 0 ͱͳΔ఺Ͱ͋Δ͜ͱ͕Θ͔Δɻ Ҏલͷٞ࿦ʹΑΓɺξn = 0 ͱͳΔ఺͸Ϛʔδϯڥք্ʹଘࡏ͠ɺ 0 < ξn ≤ 1 ͷ఺͸Ϛʔδϯ಺ʹ͋Δ͕ਖ਼͘͠෼ྨ͞Ε͍ͯΔσʔλͰ ͋Γɺͦͯ͠ɺ1 < ξn ͸ޡ෼ྨ͞Ε͍ͯΔ఺Ͱ͋Δɻ an ̸= 0 ͱͳΔ఺͕αϙʔτϕΫτϧͳͷͰɺ͔ͨ͠ʹϚʔδϯ্ʹ ৐͍ͬͯͳ͍఺Ͱ΋αϙʔτϕΫτϧʹͳΓ͏Δ͜ͱ͕͋Δɻ 59 / 74
  60. 3-3. ճؼͷͨΊͷ SVM ͜Ε·Ͱ͸෼ྨ໰୊ʹରͯ͠ SVM Λద༻͖͕ͯͨ͠ɺࠓ౓͸ճؼ໰୊ ʹద༻ͯ͠ΈΔɻ ͦ͜Ͱ͍ͭ΋ͷΑ͏ʹҎԼͰఆٛ͞ΕΔਖ਼ଇԽ͞Εͨޡࠩؔ਺Λ࠷খ Խ͢Δ͜ͱΛߟ͑Δɻ 1

    2 N ∑ n=1 {yn − tn }2 + λ 2 ∥w∥2 (3.39) ͜͜Ͱɺyn = y(xn ) = wTϕ(xn ) + b Ͱ͋Δɻ ࠓճ͸͜ͷޡࠩؔ਺ͷೋ৐࿨ͷ෦෼Λ ϵ ڐ༰ޡࠩؔ਺ Eϵ (yn − tn ) ʹஔ ͖׵͑Δɻ ϵ ڐ༰ޡࠩؔ਺ Eϵ (yn − tn ) ͱ͸ɺҎԼͷΑ͏ͳؔ਺Ͱ͋Γɺ Eϵ (yn − tn ) = { 0 (|yn − tn | ≤ ϵ) |yn − tn | − ϵ (otherwise) (3.40) |yn − tn | ͷ஋͕͋Δ ϵ (> 0)(ϋΠύʔύϥϝʔλ) ະຬͷͱ͖͸ 0(ڐ ༰)ɺϵ Ҏ্ͷͱ͖͸ઢܗͷίετΛ༩͑Δؔ਺Ͱ͋Δɻ 60 / 74
  61. 3-3. ճؼͷͨΊͷ SVM ͜ͷ ϵ ڐ༰ޡࠩؔ਺Λ༻͍ͯɺޡࠩؔ਺ (3.39) ΛҎԼͷΑ͏ʹमਖ਼ ͢Δɻ C

    N ∑ n=1 Eϵ (yn − tn ) + 1 2 ∥w∥2 (3.41) ͜͜ͰɺC ͸ਖ਼ଇԽύϥϝʔλ λ ͷٯ਺Ͱ͋Δɻ ϵ ڐ༰ޡࠩؔ਺ Eϵ (yn − tn ) ͸ 0 ͱͳΔͷ͸ɺyn − ϵ ≤ tn ≤ yn + ϵ ͱͳ Δͱ͖Ͱ͋Γɺ͜ͷൣғΛ ϵ νϡʔϒͱ͍͏ɻ ճؼؔ਺ y(x) ͱ ϵ νϡʔϒͱσʔλ఺ͷਤ͕ҎԼͰ͋Δɻ 61 / 74
  62. 3-3. ճؼͷͨΊͷ SVM ෼ྨ໰୊ͷ SVM ͱಉ༷ʹɺճؼͰ΋εϥοΫม਺Λ༻͍ͯ࠷దԽ໰୊ Λදݱ͢Δ͜ͱ͕Ͱ͖Δɻ ࠓճ͸ 2 ͭͷਖ਼ͷεϥοΫม਺

    ξn ͱ ˆ ξn Λ༻ҙ͠ɺϵ νϡʔϒͷதʹ͋ Δσʔλ఺͸ yn − ϵ ≤ tn ≤ yn + ϵ Λຬͨ͢͜ͱ͔Βɺϵ νϡʔϒͷ্ଆ ʹ֎ΕΔ఺ͱԼଆʹ֎ΕΔ఺Λڐ༰͢Δ੍໿৚݅͸ҎԼͷΑ͏ʹͳΔɻ tn ≤ y(xn ) + ϵ + ξn (3.42) tn ≥ y(xn ) − ϵ − ˆ ξn (3.43) ͜ͷεϥοΫม਺Λ༻͍Δͱɺޡࠩؔ਺͸ҎԼͷΑ͏ʹमਖ਼Ͱ͖Δɻ C N ∑ n=1 (ξn + ˆ ξn ) + 1 2 ∥w∥2 (3.44) 62 / 74
  63. 3-3. ճؼͷͨΊͷ SVM ৚݅ (3.42), (3.43) ͱଛࣦؔ਺ (3.44) ͷҙຯΛߟ͑Δɻ ϵ

    νϡʔϒͷதʹ͋Δσʔλ఺͸ y(xn ) − ϵ ≤ tn ≤ y(xn ) + ϵ Ͱ͋Γɺ ৚݅ (3.42), (3.43) ΑΓ ξn = ˆ ξn = 0 ͱ͢Δ͜ͱ͕Ͱ͖ɺ(3.44) ͷҰ߲ ໨ʹ࠷খͷد༩ (= 0) Λ༩͑Δɻ ϵ νϡʔϒ͔Β্ଆʹ֎Ε͍ͯΔσʔλ͸ y(xn ) − ϵ ≤ tn ͸ຬͨ͢ͷ Ͱɺˆ ξn = 0 ͱͰ͖Δ͕ɺtn ≤ y(xn ) + ϵ ͸ຬͨ͞ͳ͍ͷͰɺξn > 0 ͱͳ Β͟ΔΛಘͳ͍ɻ ξn > 0 ͸ (3.44) ͷҰ߲໨ʹਖ਼ͷد༩ (ϖφϧςΟ) Λ༩͑Δɻ ಉ༷ͷٞ࿦ʹΑΓɺϵ νϡʔϒ͔ΒԼଆʹ֎Ε͍ͯΔσʔλ͸ ξn = 0 ͱͰ͖Δ͕ɺˆ ξn > 0 ͱͳΒ͟ΔΛಘͳ͍ɻ 63 / 74
  64. 3-3. ճؼͷͨΊͷ SVM ࣍ʹෆ౳ࣜ৚݅ (3.42), (3.43) ͱ ξn ≥ 0,

    ˆ ξn ≥ 0 ͷԼͰ (3.44) Λ࠷খʹ ͢Δ͜ͱΛߟ͑Δɻ ͳͷͰɺϥάϥϯδϡؔ਺͸ҎԼͷΑ͏ʹ͢Δɻ L =C N ∑ n=1 (ξn + ˆ ξn ) + 1 2 ∥w∥2 − N ∑ n=1 (µn ξn + ˆ µn ˆ ξn ) − N ∑ n=1 an (ϵ + ξn + yn − tn ) − N ∑ n=1 ˆ an (ϵ + ˆ ξn − yn + tn ) (3.45) ͜͜Ͱɺyn = y(xn ) = wTϕ(xn ) + b Ͱ͋Δɻ 64 / 74
  65. 3-3. ճؼͷͨΊͷ SVM ͦͯ͠ɺKKT ৚݅͸ҎԼͰ͋Δɻ an ≥ 0, ˆ an

    ≥ 0 (3.46) ϵ + ξn + yn − tn ≥ 0, ϵ + ˆ ξn + yn − tn ≥ 0 (3.47) an (ϵ + ξn + yn − tn ) = 0, ˆ an (ϵ + ˆ ξn + yn − tn ) = 0 (3.48) µn ≥ 0, ˆ µn ≥ 0 (3.49) ξn ≥ 0, ˆ ξn ≥ 0 (3.50) µn ξn = 0, ˆ µn ˆ ξn = 0 (3.51) 65 / 74
  66. 3-3. ճؼͷͨΊͷ SVM ·ͨɺϥάϥϯδϡؔ਺ (3.45) Λ w ͱ b ͱ

    ξn ͱ ˆ ξn Ͱඍ෼ͨࣜ͠Λθ ϩͱஔ͍ͨࣜ͸ҎԼͰ͋Δɻ w = N ∑ n=1 (an − ˆ an )ϕ(xn ) (3.52) 0 = N ∑ n=1 (an − ˆ an ) (3.53) an = C − µn , ˆ an = C − ˆ µn (3.54) ͜ΕΒͷࣜΛ༻͍Δͱɺϥάϥϯδϡؔ਺ (3.45) Λ an ͱ ˆ an ͷΈͷؔ ਺Ͱ͔͚ͯɺҎԼͷΑ͏ʹͳΔɻ ˜ L = − 1 2 N ∑ n=1 N ∑ m=1 (an − ˆ an )(am − ˆ am )k(xn , xm ) − ϵ N ∑ n=1 (an + ˆ an ) + N ∑ n=1 (an − ˆ an )tn (3.55) 66 / 74
  67. 3-3. ճؼͷͨΊͷ SVM ·ͨɺ৚݅ࣜ (3.54) ͱɺ(3.46), (3.49) ΑΓ an ͱ

    ˆ an ʹ͸ҎԼͷ৚͕݅ ෇͘ɻ 0 ≤ an ≤ C (3.56) 0 ≤ ˆ an ≤ C (3.57) ͞Βʹɺ(3.52) Λ y(x) = wTϕ(x) + b ʹ୅ೖ͢Δͱɺ y(x) = N ∑ n=1 (an − ˆ an )k(x, xn ) + b (3.58) ͱͳΔɻ 67 / 74
  68. 3-3. ճؼͷͨΊͷ SVM ༧ଌʹد༩͢ΔαϙʔτϕΫτϧͷੑ࣭ΛٻΊΔɻ ·ͣɺ(3.48) ΑΓɺan ͕θϩҎ֎ͷ఺͸ ϵ + ξn

    + yn − tn = 0 Λຬͨ͢ɻ ͜Ε͸ ϵ νϡʔϒͷڥք্ (ξn = 0) ΋͘͠͸ ϵ νϡʔϒͷ্ଆ (ξn > 0) ͷ఺Ͱ͋Δɻ ·ͨɺˆ an ͕θϩҎ֎ͷ఺͸ ϵ + ˆ ξn − yn + tn = 0 Λຬͨ͢ɻ ͜Ε͸ ϵ νϡʔϒͷڥք্ (ˆ ξn = 0) ΋͘͠͸ ϵ νϡʔϒͷԼଆ (ˆ ξn > 0) ͷ఺Ͱ͋Δɻ ͞Βʹɺϵ + ξn + yn − tn = 0 ͱ ϵ + ˆ ξn − yn + tn = 0 ͕ಉ࣌ʹ੒Γཱͭ ͱԾఆ͢Δͱɺ͜ΕΒΛ଍͢ͱ 2ϵ + ξn + ˆ ξ = 0 (3.59) ͱͳΓɺϵ > 0 ͔ͭ ξn ≥ 0 ͔ͭ ˆ ξn ≥ 0 ΑΓɺ(3.59) ͸੒ཱ͠ͳ͍ɻ (ໃ६) Αͬͯɺϵ+ξn +yn −tn = 0 ͱ ϵ+ ˆ ξn −yn +tn = 0 ͸ಉ࣌ʹ੒Γཱͨͳ ͍͜ͱ͕Θ͔Γɺ͜Ε͸ an ͱ ˆ an ͷͲͪΒ͔͸θϩʹͳΔ͜ͱΛද͢ɻ 68 / 74
  69. 3-3. ճؼͷͨΊͷ SVM (3.58) ΑΓɺ༧ଌʹد༩͢Δ఺͸ an ΋͘͠͸ ˆ an ͷͲͪΒ͔͕θϩͰͳ

    ͍఺Ͱ͋Δɻ(্ͷٞ࿦ΑΓɺan ͱ ˆ an ͷͲͪΒ͔͸ඞͣθϩʹͳΔɻ) ͭ·Γɺan = ˆ an = 0 ͷ఺͸༧ଌʹد༩͠ͳ͍఺Ͱ͋Γɺ͜ΕΒ͸ ϵ νϡʔϒ಺ʹ͋Δ఺Ͱ͋Δɻ ҰํͰɺan ΋͘͠͸ ˆ an ͷͲͪΒ͔͕θϩͰͳ͍఺͸ ϵ νϡʔϒͷڥք ্ɺ΋͘͠͸ ϵ νϡʔϒͷ֎ଆʹ͋Δ఺Ͱ͋Γɺ͜ΕΒ͕αϙʔτϕΫ τϧͱͳΔɻ Αͬͯɺ༧ଌʹ͓͍ͯɺૄͳղ͕ಘΒΕɺαϙʔτϕΫτϧͷΈߟྀ͢ Ε͹͍͍͜ͱ͕Θ͔Δɻ 69 / 74
  70. 3-4. ճؼͷͨΊͷ RVM SVM Ͱ͸ɺ৽͘͠༩͑ΒΕͨະ஌ͷೖྗ x ʹରͯ͠ɺy(x)(ࣝผؔ਺) ͷਖ਼ෛʹΑΓΫϥε෼͚Λ͢Δ͚ͩͰ͋Γɺ໨ඪϥϕϧ t ͷ༧ଌ෼෍͸

    ༩͑ͳ͍ɻ ͦΕʹରͯ͠ɺRVM ͸֬཰࿦Λ༻͍Δ͜ͱͰɺϕΠζਪ࿦ʹج͖ͮ t ͷ༧ଌ෼෍Λ༩͑Δ͜ͱ͕Ͱ͖Δɻ RVM ͸ SVM ͱಉ༷ʹճؼʹ΋෼ྨʹ΋༻͍Δ͜ͱ͕Ͱ͖ɺ·ͣ͸ճ ؼʹ͍ͭͯઆ໌͢Δɻ ·ͣɺग़ྗ y(x) ΛҎԼͷΑ͏ʹؔ਺ k(x, xn ) Λ༻͍ͯల։͢Δɻ y(x) = N ∑ n=1 wn k(x, xn ) + b (3.60) ͜͜Ͱɺ{wn } ͱ b ͸ύϥϝʔλͰ͋Γɺ(3.60) ͸ SVM ͷճؼͷ࣌ʹग़ ͖ͯͨग़ྗ (3.58) ͱಉ͡ܗΛ͍ͯ͠Δɻ ͨͩ͠ɺؔ਺ k(x, x′) ͸೚ҙͷؔ਺Ͱ͋Δ఺͕ (3.58) ͱҟͳΔɻ 70 / 74
  71. 3-4. ճؼͷͨΊͷ RVM ೖྗσʔλͷू߹ X = {x1 , x2 ,

    · · · , xN } ͱͦΕͧΕʹରԠ͢Δ໨ඪม ਺ͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙ͠ɺt = (t1 , t2 , · · · , tN )T ͱ͢Δͱɺ໬ ౓ؔ਺͸ҎԼͷΑ͏ʹͳΔɻ p(t|X, w, β) = N ∏ n=1 p(tn |xn , w, β) (3.61) ·ͨɺࣄલ෼෍ͱͯ͠ҎԼΛԾఆ͢Δɻ p(w|α) = N ∏ i=1 N(wi |0, α−1 i ) (3.62) ͜͜Ͱɺύϥϝʔλ wi ͝ͱʹ௒ύϥϝʔλ αi Λఆ͍ٛͯ͠Δ͜ͱʹ ஫ҙɻ 71 / 74
  72. 3-4. ճؼͷͨΊͷ RVM ໬౓ؔ਺ͱࣄલ෼෍Λ༻͍ΔͱɺҎԼͷୈೋछ໬౓ؔ਺͕ٻ·Δɻ p(t|X, α, β) = ∫ p(t|X,

    w, β)p(w|α) dw (3.63) ͜ͷୈೋछ໬౓ؔ਺Λ༻͍ͯɺΤϏσϯεۙࣅ (PRML 3.5) ʹΑΓɺϋ Πύʔύϥϝʔλ α, β ΛܾΊΔ͜ͱ͕Ͱ͖Δɻ ࣮ࡍʹ͜ͷϞσϧઃఆͰ α ΛٻΊͯΈΔͱɺ{αi } ͷҰ෦͕ແݶେʹൃ ࢄ͢Δɻ ͜Ε͸ɺରԠ͢Δ wi ͕ฏۉɺ෼ࢄͱ΋ʹθϩͷΨ΢ε෼෍ʹै͏͜ͱ Λҙຯ͢Δɻ(PRML 3.5) Αͬͯɺ(3.60) ΑΓɺରԠ͢Δڭࢣσʔλ xi ͸༧ଌʹ͸د༩ͤͣɺૄ ͳղ͕ಘΒΕΔɻ 72 / 74
  73. 3-5. ෼ྨͷͨΊͷ RVM ࠓ౓͸ RVM Λ෼ྨ໰୊ (ೋ஋෼ྨ) ʹద༻͢Δɻ ೋ஋෼ྨͰ͸ɺग़ྗ͸ 0

    ≤ y(x) ≤ 1 ͱͳΔ΂͖ͳͷͰɺ(3.60) ʹϩδ εςΟοΫγάϞΠυͰม׵ͨ͠ҎԼΛߟ͑Δɻ y(x) = σ ( N ∑ n=1 wn k(x, xn ) + b ) (3.64) ·ͨɺೖྗσʔλͷू߹ X = {x1 , x2 , · · · , xN } ͱͦΕͧΕʹରԠ͢Δ ໨ඪม਺ͷू߹ {t1 , t2 , · · · , tN } Λ༻ҙ͠ɺt = (t1 , t2 , · · · , tN )T ͱ͢Δ ͱɺ໬౓ؔ਺͸ 2-4 ͷͱ͖ͱಉ͡Α͏ʹɺϕϧψʔΠ෼෍ͷੵͱͳΔɻ p(t|w) = N ∏ n=1 ytn n (1 − yn )1−tn (3.65) ͜͜Ͱɺyn = y(xn ) Ͱ͋Δɻ 73 / 74
  74. 3-5. ෼ྨͷͨΊͷ RVM ·ͨɺࣄલ෼෍ͱͯ͠ҎԼΛԾఆ͢Δɻ p(w|α) = N ∏ i=1 N(wi

    |0, α−1 i ) (3.66) ͜͜Ͱ΋ɺύϥϝʔλ wi ͝ͱʹ௒ύϥϝʔλ αi Λఆ͍ٛͯ͠Δ͜ͱ ʹ஫ҙɻ ໬౓ؔ਺ͱࣄલ෼෍Λ༻͍ΔͱɺҎԼͷୈೋछ໬౓ؔ਺͕ٻ·Δɻ p(t|X, α, β) = ∫ p(t|X, w, β)p(w|α) dw (3.67) ͨͩ͠ɺϩδεςΟοΫγάϞΠυؔ਺͕ݪҼͰ͜ͷੵ෼͸ղੳతʹ࣮ ߦෆՄೳɻ PRML Ͱ͸ϥϓϥεۙࣅ (PRML 4.5.1) Λ༻͍ͯۙࣅతʹੵ෼Λ࣮ߦ͠ ͍ͯΔɻ 3-4 ͷճؼͷͱ͖ͱಉ༷ʹ α ΛٻΊͯΈΔͱɺ{αi } ͷҰ෦͕ແݶେʹ ൃࢄ͢ΔͷͰɺૄͳղ͕ಘΒΕΔɻ 74 / 74