PRMLセミナー(第9章)

370e1dde1ef2391bdebe02e4a777890e?s=47 gucchi
May 25, 2019

 PRMLセミナー(第9章)

370e1dde1ef2391bdebe02e4a777890e?s=128

gucchi

May 25, 2019
Tweet

Transcript

  1. ୈ 4 ճ PRML ηϛφʔ 2019/05/27 ࡔޱ ྒี 1 /

    67
  2. 0. ࠓճͷηϛφʔʹ͍ͭͯ ࠓճͷηϛφʔͰ͸ɺPRML ͷୈ 9 ষͷ EM ΞϧΰϦζϜΛ͓࿩͠͠ ͍ͨͱࢥ͍·͢ɻ ·ͨɺ͜ΕΒͷ࿩୊Λઆ໌͢ΔͨΊʹඞཁͳ༧උ஌ࣝΛղઆ͠·͢ɻ

    (PRML 1.2 ֬཰࿦ͱ PRML 2.3 Ψ΢ε෼෍) ͳ͓஫ҙ఺ͱͯ͠ɺຊεϥΠυͷࣜ൪߸ͱ PRML ͷࣜ൪߸͸ҟͳΓ· ͢ͷͰɺ͝஫ҙ͍ͩ͘͞ɻ 2 / 67
  3. ໨࣍ 1. ༧උ஌ࣝ 1-1. ֬཰࿦ (PRML 1.2) 1-2. Ψ΢ε෼෍ (PRML

    2.3) 1-3. ࠞ߹Ψ΢ε෼෍ (PRML 2.3.9) 2. ࠞ߹Ϟσϧͱ EM 2-1. K-means ΫϥελϦϯά (PRML 9.1) 2-2. ࠞ߹Ψ΢ε෼෍ (PRML 9.2) 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ (PRML 9.3) 2-4. Ұൠͷ EM ΞϧΰϦζϜ (PRML 9.4) 3 / 67
  4. 1-1. ֬཰࿦ ύλʔϯೝࣝʹ͓͍ͯɺॏཁͳෆ࣮֬ੑΛఆྔతʹධՁ͢ΔͨΊʹ֬཰ ࿦Λಋೖ͢Δɻ ֬཰ม਺ X, Y Λߟ͑ɺ͜ΕΒ͸ X =

    xi (i = 1, 2, · · · , M)ɺ Y = yj (j = 1, 2, · · · , L) ΛͱΔͱ͠ɺX = xi , Y = yj ͱͳΔ֬཰ (ಉ࣌ ֬཰) Λ p(X = xi , Y = yj ) ͱ͔͘ɻ X = xi ͱͳΔ֬཰ p(X = xi ) ͸ɺp(X = xi , Y = yj ) Λ༻͍ͯҎԼͷ Α͏ʹ͔͚Δɻ(Ճ๏ఆཧ) p(X = xi ) = L ∑ j=1 p(X = xi , Y = yj ) (1.1) ·ͨɺX = xi ͕༩͑ΒΕ্ͨͰɺY = yj ͱͳΔ֬཰ (৚݅෇͖֬཰) Λ p(Y = yj |X = xi ) ͱ͢ΔͱɺҎԼͷΑ͏ͳؔ܎͕ࣜ੒ཱ͢Δɻ(৐๏ ఆཧ) p(X = xi , Y = yj ) = p(Y = yj |X = xi )p(X = xi ) (1.2) 4 / 67
  5. 1-1. ֬཰࿦ ৐๏ఆཧͱಉ࣌֬཰ͷରশੑ p(X, Y ) = p(Y, X) Λ༻͍ΔͱɺϕΠζͷ

    ఆཧ͕ಋ͚Δɻ p(Y |X) = p(X|Y )p(Y ) p(X) (1.3) ͞Βʹɺಉ࣌෼෍ p(X, Y ) ͕ҎԼͷΑ͏ʹपล෼෍ͷੵͰදͤΔ࣌ɺX ͱ Y ͸ಠཱͰ͋Δͱ͍͏ɻ p(X, Y ) = p(X) p(Y ) (1.4) 5 / 67
  6. 1-1. ֬཰࿦ ͜Ε·Ͱ͸཭ࢄతͳ֬཰ม਺ʹ͍ͭͯߟ͖͑ͯͨɻ࣍ʹ࿈ଓతͳ֬཰ ม਺ͷ෼෍ʹ͍ͭͯߟ͑Δɻ ֬཰ม਺ x ͕ (x, x +

    δx) ͷൣғʹೖΔ֬཰͕ δx → 0 ͷ࣌ʹ p(x) δx ͱ ༩͑ΒΕΔ࣌ɺp(x) Λ֬཰ີ౓ͱ͍͏ɻ ͜ͷ࣌ɺม਺ x ͕۠ؒ (a, b) ʹ͋Δ֬཰͸ҎԼͷࣜͰ༩͑ΒΕΔɻ p(x ∈ (a, b)) = ∫ b a p(x) dx (1.5) ·ͨɺ֬཰ͷඇෛੑͱن֨ԽΑΓɺp(x) ͸ҎԼͷੑ࣭Λ࣋ͭɻ p(x) ≥ 0 (1.6) ∫ ∞ −∞ p(x) dx = 1 (1.7) 6 / 67
  7. 1-2. Ψ΢ε෼෍ ࣍ʹɺ࿈ଓม਺ͷ֬཰෼෍Ͱ࠷΋ॏཁͳ෼෍Ͱ͋ΔΨ΢ε෼෍ʹ͍ͭͯ ड़΂Δɻ Ψ΢ε෼෍͸ҎԼͰఆٛ͞ΕΔɻ(ύϥϝʔλ͸ฏۉ µ ͱ෼ࢄ σ2 ͷ 2

    ͭ) N(x|µ, σ2) = 1 (2πσ2)1/2 exp { − 1 2σ2 (x − µ)2 } (1.8) 7 / 67
  8. 1-2. Ψ΢ε෼෍ Ψ΢ε෼෍͸֬཰ີ౓ؔ਺ͳͷͰɺҎԼͷن֨Խ৚݅Λຬͨ͢ɻ ∫ N(x|µ, σ2) dx = 1 (1.9)

    ·ͨɺΨ΢ε෼෍ͷॏཁͳੑ࣭ͱͯ͠ɺx ͷฏۉ஋Λ෼ࢄ͕ͦΕͧΕ µ ͱ σ2 Ͱ༩͑ΒΕΔ͜ͱͰ͋Δɻ E[x] = ∫ ∞ −∞ N(x|µ, σ2)x dx = µ (1.10) var[x] = E[x2] − E[x]2 = σ2 (1.11) 8 / 67
  9. 1-2. Ψ΢ε෼෍ ࣍ʹɺҎԼͷ D ࣍ݩͷϕΫτϧ x ʹର͢ΔଟมྔΨ΢ε෼෍Λಋೖ ͢Δɻ N(x|µ, Σ)

    = 1 (2π)D/2 1 |Σ|1/2 exp { − 1 2 (x − µ)TΣ−1(x − µ) } (1.12) ͜͜Ͱɺµ Λ D ࣍ݩͷฏۉϕΫτϧͱ͠ɺΣ Λ D × D ͷڞ෼ࢄߦྻͱ ͢Δɻ ͜ͷ৔߹Ͱ΋ن֨Խ৚݅ͱɺฏۉͱڞ෼ࢄ͸ҎԼͷੑ࣭Λຬͨ͢ɻ ∫ N(x|µ, Σ) dx = 1 (1.13) E[x] = ∫ N(x|µ, Σ)x dx = µ (1.14) cov[x] = E[(x − E[x])(x − E[x])T] = Σ (1.15) 9 / 67
  10. 1-3. ࠞ߹Ψ΢ε෼෍ ಋೖͨ͠ଟมྔΨ΢ε෼෍Λ࠷໬ਪఆΛ༻͍ͯɺ؍ଌ͞Εͨϓϩοτʹ ౰ͯ͸ΊͯݟΔ͜ͱΛߟ͑Δɻ Ψ΢ε෼෍ͷ୯ๆੑ (෼෍ͷࢁ͕Ұͭ) ͸෼෍Λσʔλʹ౰ͯ͸ΊΔͱ ͖ʹେ͖ͳ੍ݶΛ༩͑Δɻ ྫ͑͹ɺ2 ࣍ݩσʔλ

    (Old Faithful ؒܽઘσʔλ) ͷ྘৭ͷϓϩοτʹ ରͯ͠ɺD = 2 ͷଟมྔΨ΢ε෼෍ (1.12) Λ౰ͯ͸Ίͨͷ͕ҎԼͷά ϥϑͰ͋Δɻ Ψ΢ε෼෍͕୯ๆͰ͋ΔͨΊʹɺϓϩοτͷ 2 ͭͷմΛ͏·͘දݱͰ͖ ͓ͯΒͣɺϓϩοτ͕ଘࡏ͠ͳ͍఺͕؍ଌ͞ΕΔ֬཰͕ߴ͘ͳͬͯ ͍Δɻ 10 / 67
  11. 1-3. ࠞ߹Ψ΢ε෼෍ ͦ͜ͰɺҎԼͷࠞ߹Ψ΢ε෼෍Λಋೖ͢Δɻ p(x|π, µ, Σ) = K ∑ k=1

    πk N(x|µk , Σk ) (1.16) ͜͜Ͱɺπ, µ, Σ ͸ͦΕͧΕ πk , µk , Σk ͷू߹Λද͢ɻ µ, Σ ʹ͸୯ҰΨ΢ε෼෍ͷͱ͖ͱಉ͡Α͏ʹ߆ଋ৚݅͸ͳ͍͕ɺ(1.16) ͷن֨Խ৚݅ΑΓ ∫ p(x|π, µ, Σ) dx = K ∑ k=1 πk ∫ N(x|µk , Σk ) dx = K ∑ k=1 πk = 1 (1.17) ͱͳΓɺπk ʹ͸੍ݶ͕ͭ͘ɻ 11 / 67
  12. 1-3. ࠞ߹Ψ΢ε෼෍ 2 ࣍ݩσʔλ (Old Faithful ؒܽઘσʔλ) ରͯ͠ɺK = 2

    ͷࠞ߹Ψ΢ε ෼෍Λ౰ͯ͸Ίͨͷ͕ҎԼͷάϥϑͰ͋Δɻ ࠞ߹Ψ΢ε෼෍ͷཁૉͰ͋Δ 2 ͭͷΨ΢ε෼෍͕ϓϩοτͷ 2 ͭͷմ Λ͏·͘දݱ͍ͯ͠Δ͜ͱ͕Θ͔Δɻ 12 / 67
  13. 2. ࠞ߹Ϟσϧͱ EM ࣄલ४උͱͯ͠঺հͨࠞ͠߹Ϟσϧ (ࠞ߹Ψ΢ε෼෍) ͸σʔλͷΫϥ ελϦϯάʹ࢖༻Ͱ͖Δɻ ·ͣ 2-1 (PRML

    9.1) Ͱ͸ɺΫϥελϦϯάͱ͸Կ͔Λઆ໌͠ͳ͕Βɺ K-means ΞϧΰϦζϜΛ༻͍ͨඇ֬཰తͳΫϥελϦϯάͷํ๏ʹͭ ͍ͯઆ໌͢Δɻ ࣍ʹ 2-2 (PRML 9.2) Ͱ͸ɺજࡏม਺Λಋೖ͠ɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ ਪఆΛߦ͏͜ͱͰɺൃݟ๏తʹ EM ΞϧΰϦζϜΛಋೖ͢Δɻ 2-3 (PRML 9.3) Ͱ͸ɺҰൠతͳ EM ΞϧΰϦζϜΛಋೖ͠ɺͦΕΛࠞ ߹Ψ΢ε෼෍ʹద༻͢Δ͜ͱͰ 2-2 (PRML 9.2) ͷ݁ՌΛ࠶ݱ͢Δ͜ͱ ΛΈΔɻ ࠷ޙʹ 2-4 (PRML 9.4) Ͱ͸ɺ2-3 (PRML 9.3) Ͱಋೖͨ͠Ұൠతͳ EM ΞϧΰϦζϜ͕໬౓ؔ਺Λ࠷େԽ͢Δ͜ͱΛࣔ͢ɻ 13 / 67
  14. 2-1. K-means ΫϥελϦϯά ·ͣ͸͡ΊʹɺD ࣍ݩϢʔΫϦου্ۭؒͷ֬཰ม਺ x ʹ͓͍ͯɺN ݸͷ؍ଌσʔλ {x1 ,

    · · · , xN } ͕͋Δͱ͢Δɻ ΫϥελϦϯάͷ໨త͸͜ΕΒͷ؍ଌσʔλ {x1 , · · · , xN } Λ K ݸͷ Ϋϥελʔ (σʔλू߹ͷάϧʔϓ) ʹׂΓৼΔ͜ͱͰ͋Δɻ ɹ K-means ΫϥελϦϯάͰ͸ɺ·ͣϓϩτλΠϓͱݺ͹ΕΔ K ݸͷ D ࣍ݩϕΫτϧ µk (k = 1, · · · , K) Λ༻ҙ͢Δɻ (࣮૷Ͱ͸ɺµk ͷॳظ஋ͱͯ͠ɺϥϯμϜͳϕΫτϧͱͨ͠Γ͢Δɻ) ޙͰΘ͔ΔΑ͏ʹɺK-means ΫϥελϦϯάద༻ޙɺϓϩτλΠϓ µk ͸Ϋϥελʔ k ͷத৺Λද͢ϕΫτϧʹͳΔ͜ͱ͕Θ͔Δɻ 14 / 67
  15. 2-1. K-means ΫϥελϦϯά ࣍ʹɺ֤σʔλ xn ʹରͯ͠ɺೋ஋ม਺ rnk ∈ {0, 1}

    Λಋೖ͢Δɻ ͜ͷม਺͸σʔλ xn ͕ K ݸ͋ΔΫϥελʔͷ಺ɺͲͷΫϥελʔʹ ׂΓ౰ͯΒΕΔ͔Λද͢ม਺Ͱ͋Δɻ ·ͨɺΫϥελʔ k ͷఴ͑ࣈʹରͯ͠ن֨Խ ∑ k rnk = 1 Λຬͨ͢ͱ ͢Δɻ ྫ͑͹ɺ͋Δσʔλ xn ʹରͯ͠ɺ͋Δ k ʹରͯ͠ rnk = 1 Ͱ͋Γɺ j ̸= k ͳΔ j ʹରͯ͠ rnj = 0 Ͱ͋Ε͹ɺσʔλ xn ͸Ϋϥελʔ k ʹ ଐ͢Δ͜ͱΛҙຯ͢Δɻ 15 / 67
  16. 2-1. K-means ΫϥελϦϯά ͜ΕΒͷ 2 छྨͷม਺Λ༻ҙͯ͠ɺҎԼͷ໨తม਺Λ࠷খʹ͢Δͷ͕ɺ K-means ΫϥελϦϯάͰ͋Δɻ J =

    N ∑ n=1 K ∑ k=1 rnk ∥xn − µk ∥2 = N ∑ n=1 jn (2.1) ͜͜Ͱɺ jn = K ∑ k=1 rnk ∥xn − µk ∥2 (2.2) ͱఆٛͨ͠ɻ 16 / 67
  17. 2-1. K-means ΫϥελϦϯά ͜ͷ໨తؔ਺ J Λ࿪Έई౓ͱݺͿɻ͜ͷ J ͷղऍΛߟ͑Δɻ ·ͣɺ͋Δσʔλ xn

    ͕Ϋϥελʔ k ʹଐ͍ͯ͠Δͱ͢Δɻ ͭ·Γɺ͋Δ k ʹରͯ͠ rnk = 1 Ͱ͋Γɺj ̸= k ͳΔ j ʹରͯ͠ rnj = 0 Ͱ͋Δͱ͖ jn ͸ (2.2) ͸ jn = ∥xn − µk ∥2 (2.3) ͱͳΔɻ (2.3) ͸σʔλ xn ͱΫϥελʔ k ͷϓϩτλΠϓ µk ͷڑ཭ͷೋ৐Ͱ ͋Δɻ (2.1) ΑΓɺJ ͸ jn ͷ࿨ͳͷͰɺJ ͸֤σʔλ xn ͕ଐ͢ΔΫϥελʔ ͷϓϩτλΠϓͱͷڑ཭ͷೋ৐ͷ࿨ͱղऍͰ͖Δɻ 17 / 67
  18. 2-1. K-means ΫϥελϦϯά ͦΕͰ͸ɺ(2.1) ͷӈลΛ {rnk } ͱ {µk }

    ʹ͍ͭͯ࠷খԽ͢Δɻ ͜͜Ͱ͸ɺrnk ͱ µk ͷͦΕͧΕʹରͯ͠࠷খԽΛߦ͏ 2 ͭͷεςοϓ Λަޓʹ܁Γฦ͢͜ͱͰ࣮ݱ͢Δɻ ͭ·Γɺ·ͣ µk ͷॳظ஋ͱͯ͠ϥϯμϜͳϕΫτϧΛઃఆ͠ɺҎԼͷ 2 εςοϓΛ܁Γฦ͢ɻ 1. µk Λݻఆͭͭ͠ɺrnk ʹ͍ͭͯ J Λ࠷খԽ͢Δɻ 2. rnk Λݻఆͭͭ͠ɺµk ʹ͍ͭͯ J Λ࠷খԽ͢Δɻ ࣮ࡍ͸ɺ͜ͷ 2 εςοϓΛ࠶ׂΓ౰͕ͯى͜Βͳ͘ͳΔ·Ͱߦ͏ɻ(܁ Γฦ͠ l ճ໨ͷ rnk ͷ஋ͱ܁Γฦ͠ l + 1 ճ໨ͷ rnk ͷ஋͕͢΂ͯͷ n ͱ k Ͱ౳͘͠ͳΔͱऴྃɻ) ·ͨɺ͋Β͔͡Ί܁Γฦ͠ճ਺ΛܾΊ͓ͯ͘৔߹΋͋Δɻ 18 / 67
  19. 2-1. K-means ΫϥελϦϯά ͦΕͰ͸ɺ࣮ࡍʹεςοϓ 1 ͔ΒߦͬͯΈΔɻ ·ͣɺJ Λ rnk ʹ͍ͭͯ࠷খԽ͢Δʹ͸ɺͦΕͧΕ

    n ʹର͢Δ jn Λ࠷ খʹ͢Δ rnk ΛٻΊΕ͹Α͘ɺ jn = K ∑ k=1 rnk ∥xn − µk ∥2 (2.4) Ͱ͋Δ͔Βɺ͢΂ͯͷ k ͷதͰ ∥xn − µk ∥2 ͕࠷΋খ͍͞ k ʹର͢Δ rnk Λ 1 ʹ͢Ε͹Α͍ɻ·ͨɺͦΕҎ֎ͷ j ̸= k ʹର͢Δ rjk ͸θϩʹ ͢Ε͹Α͍ɻ ͭ·ΓࣜͰද͢ͱɺ rnk = { 1 k = argminj ∥xn − µj ∥2 0 otherwise (2.5) ͱͳΔɻ 19 / 67
  20. 2-1. K-means ΫϥελϦϯά ࣍ʹεςοϓ 2 Λߦ͏ɻ εςοϓ 1 ͰܾΊͨ rnk

    Λݻఆͯ͠ɺ(2.1) Λ µk ʹରͯ͠࠷খԽ͢Δ ͨΊʹ J Λ µk Ͱඍ෼͢Δͱ ∂J ∂µk = N ∑ n=1 rnk ∂ ∂µk ∥xn − µk ∥2 = −2 N ∑ n=1 rnk (xn − µk ) = −2 ( N ∑ n=1 rnk xn − µk N ∑ n=1 rnk ) (2.6) ͱͳΔɻ 20 / 67
  21. 2-1. K-means ΫϥελϦϯά (2.6) ΑΓɺJ Λ࠷খԽ͢Δ µk ͸ µk =

    N ∑ n=1 rnk xn N ∑ n=1 rnk (2.7) ͱͳΔɻ (2.7) ͷӈลͷ෼฼͸Ϋϥελʔ k ʹଐ͍ͯ͠Δσʔλͷ૯਺Ͱ͋Γɺ (2.7) ͷӈลͷ෼ࢠ͸Ϋϥελʔ k ʹଐ͍ͯ͠Δσʔλ (ϕΫτϧ) ͷ࿨ Ͱ͋Δɻ ͭ·Γɺµk ͸Ϋϥελʔ k ͷத৺Λද͢ϕΫτϧͰ͋Δɻ 21 / 67
  22. 2-1. K-means ΫϥελϦϯά ࠷ޙʹɺҎԼͷ۩ମྫΛ঺հ͢Δɻ(Old Faithful ؒܽઘσʔλ) 22 / 67

  23. 2-1. K-means ΫϥελϦϯά ·ͣɺάϥϑ͸ 2 ࣍ݩͷೖྗۭؒͰɺ؍ଌ஋Ͱ͋Δϓϩοτ͕ඳ͔Εͯ ͍Δɻ άϥϑ (a) ͷ

    2 ͭͷʷ͕ϓϩτλΠϓ µ1 ͱ µ2 ͷॳظ஋Ͱ͋Δɻ (͜ͷྫͰ͸ɺϥϯμϜͰ͸ͳ͘ɺΘ͟ͱΫϥελʔͷத৺ʹͳΓ͏Δ ఺͔Βେ͖ͣ͘Βͨ͠৔ॴΛॳظ஋ͱ͍ͯ͠Δɻ) άϥϑ (b) ͸εςοϓ 1 ͷ࣮ࢪ݁ՌͰɺµk Λݻఆͯ͠ɺrnk ΛܾΊ͓ͯ Γɺσʔλ͸ µ1 ͱ µ2 ͷ͍ۙํͷΫϥελʔʹଐ͢Δɻ άϥϑ (c) ͸εςοϓ 2 ͷ࣮ࢪ݁ՌͰɺrnk Λݻఆͯ͠ɺµk Λ (2.7) Λ ༻͍ܾͯΊ͓ͯΓɺµk ͸ͦΕͧΕͷΫϥελʔʹଐ͢ΔϕΫτϧͷฏ ۉ஋ΛͱΔɻ άϥϑ (d)ʙ(f) ͸ 2 ͭͷεςοϓͷ 2 ճ໨ͷ܁Γฦ͠Ͱɺάϥϑ (g)ʙ (i) ͸ 2 ͭͷεςοϓͷ 3 ճ໨ͷ܁Γฦ͠Ͱ͋Δɻ άϥϑ (i) ΛΈΔͱɺάϥϑ (b) ʹൺ΂ͯɺద੾ͳΫϥελʔͷׂΓ౰ ͕ͯͰ͖͍ͯΔ͜ͱ͕ݟͯͱΕΔɻ 23 / 67
  24. 2-1. K-means ΫϥελϦϯά Ҏ্͕ K-means ΫϥελϦϯάͷղઆͰ͋Δɻ K-means ΫϥελϦϯάͷಛ௃͸ɺσʔλΛ͋Δ 1 ͭͷΫϥελʔʹඞ

    ׂͣΓ౰ͯΔͱ͜ΖͰ͋Δɻ(ϋʔυׂΓ౰ͯͱ͍͏ɻ) ࠷ऴతͳ µk ʹ͍ۙσʔλ఺Ͱ͋Ε͹ɺͦͷΑ͏ͳׂΓ౰ͯ͸ਖ਼ͦ͠͏ Ͱ͋Δ͕ɺΫϥελʔͷڥքઢ্ʹ͋ΔΑ͏ͳσʔλ఺ʹରͯ͠ɺ1 ͭ ͷΫϥελʔʹඞׂͣΓ౰ͯΔ͜ͱ͕ద੾Ͱ͋Δ͔Ͳ͏͔͸Θ͔Β ͳ͍ɻ ͦ͜ͰɺׂΓ౰ͯʹෆ࣮֬ੑΛ࣋ͨͤΔΫϥελϦϯάख๏Λߟ͑Δɻ (ιϑτׂΓ౰ͯͱ͍͏ɻ) ͦͷͨΊʹ͸֬཰తͳΫϥελϦϯάख๏Λߦ͏ඞཁ͕͋Δɻ 24 / 67
  25. 2-2. ࠞ߹Ψ΢ε෼෍ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷผͷදݱͱͯ͠ɺજࡏม਺ΛؚΉࠞ߹Ψ΢ ε෼෍ͷදݱΛಋग़͢Δɻ ͜ͷજࡏม਺ΛؚΉදݱͰ࠷໬ਪఆ๏Λߦ͏ࡍɺڧྗͳ EM ΞϧΰϦζ Ϝ͕࢖༻Ͱ͖Δ͜ͱΛΈΔɻ ɹ ·ͣɺK

    ࣍ݩͷ 2 ஋֬཰ม਺ z Λಋೖ͢Δɻ ͜ͷϕΫτϧͷ੒෼Λ zk (k = 1, · · · , K) ͱ͠ɺ͜ͷ੒෼͸ zk ∈ {0, 1} ͱ ∑ k zk = 1 Λຬͨ͢ͱ͢Δɻ ͭ·Γ͜ͷ৚݅͸ɺ͋Δ k ʹ͓͍ͯ zk = 1 Ͱ͋Γɺͦͷଞͷ੒෼͸ zj(̸=k) = 0 ͱͳΔ͜ͱΛද͍ͯ͠Δɻ 25 / 67
  26. 2-2. ࠞ߹Ψ΢ε෼෍ ࣍ʹɺಋೖͨ֬͠཰ม਺ z ͷपล֬཰ p(z) ΛҎԼͷΑ͏ʹఆٛ͢Δɻ p(zk = 1)

    = πk (k = 1, · · · , K) (2.8) ͜͜Ͱɺp(zk = 1) ͸֬཰ม਺ z ͷ k ੒෼͕ zk = 1 Ͱ͋Γɺͦͷଞͷ੒ ෼͸ zj(̸=k) = 0 Ͱ͋ΔΑ͏ͳ஋Ͱͷ֬཰ p(z) Λද͢ɻ ͜ΕΑΓɺن֨Խ৚͔݅Βɺπk ͸ ∑ z p(z) = K ∑ k=1 p(zk = 1) = K ∑ k=1 πk = 1 (2.9) ͱͳΓɺ·ͨ֬཰ͷඇෛੑ (πk ≥ 0) ͱ (2.9) ΑΓ 0 ≤ πk ≤ 1 (2.10) Λຬͨ͢ɻ 26 / 67
  27. 2-2. ࠞ߹Ψ΢ε෼෍ ෼෍ (2.8) ͸ҎԼͷΑ͏ʹ·ͱΊͯॻ͚Δɻ p(z) = K ∏ k=1

    πzk k (2.11) z ͷ k ੒෼͕ zk = 1 Ͱ͋Γɺͦͷଞͷ੒෼͸ zj(̸=k) = 0 Ͱ͋ΔΑ͏ͳ ͱ͖Λߟ͑Δͱɺ p(zk = 1) = K ∏ k=1 πzk k = π0 1 ×· · ·×π0 k−1 ×π1 k ×π0 k+1 ×· · ·×π0 K = πk (2.12) ͱͳΓɺ͔֬ʹ (2.12) ͸ (2.8) ͱҰக͢Δɻ 27 / 67
  28. 2-2. ࠞ߹Ψ΢ε෼෍ ࣍ʹ D ࣍ݩͷ֬཰ม਺ x ʹ͓͍ͯɺz ͕༩͑ΒΕͨͱ͖ͷ x ͷ৚݅෇

    ͖֬཰ΛҎԼͷΑ͏ʹΨ΢ε෼෍ͰԾఆ͢Δɻ p(x|zk = 1) = N(x|µk , Σk ) (k = 1, · · · , K) (2.13) ͜Ε΋ (2.11) ͱಉ͡Α͏ʹҎԼͷΑ͏ʹ·ͱΊͯॻ͚Δɻ p(x|z) = K ∏ k=1 N(x|µk , Σk )zk (2.14) (2.11) ͱ (2.14) ΑΓɺपล෼෍ p(z) ͱ৚݅෇͖֬཰෼෍ p(x|z) Λఆٛ ͨ͠ͷͰɺ֬཰ͷՃ๏ఆཧͱ৐๏ఆཧʹΑΓɺपล෼෍ p(x) ͕ҎԼͷ Α͏ʹܭࢉͰ͖Δɻ p(x) = ∑ z p(x, z) = ∑ z p(x|z)p(z) = ∑ z K ∏ k=1 ( πk N(x|µk , Σk ) )zk = K ∑ k=1 πk N(x|µk , Σk ) (2.15) 28 / 67
  29. 2-2. ࠞ߹Ψ΢ε෼෍ (2.15) ΑΓɺपล෼෍ p(x) ͸ࠞ߹Ψ΢ε෼෍Ͱ͋Δ͜ͱ͕Θ͔Δɻ ͜ΕΑΓɺજࡏม਺ z ΛؚΉࠞ߹Ψ΢ε෼෍ͷผͷදݱΛಋग़Ͱ͖ͨɻ ͜͜ͰɺॏཁͳྔͰ͋Δෛ୲཰Λಋೖ͢Δɻ

    ෛ୲཰ͱ͸ɺx ͕༩͑ΒΕͨͱ͖ͷ z ͷ৚݅෇͖֬཰Ͱ͋Γɺ γk (x) = p(zk = 1|x) ͱॻ͘ɻ γk (x) ͸ (2.8) ͱ (2.13) ͱϕΠζͷఆཧ (1.3) ΑΓɺ γk (x) = p(zk = 1|x) = p(zk = 1)p(x|zk = 1) K ∑ j=1 p(zj = 1)p(x|zj = 1) = πk N(x|µk , Σk ) K ∑ j=1 πj N(x|µj , Σj ) (2.16) ͱͳΔɻ 29 / 67
  30. 2-2. ࠞ߹Ψ΢ε෼෍ ͜ͷෛ୲཰ͷղऍΛྫΛ༻͍ͯઆ໌͢Δɻ(ҎԼͷਤ͸ 3 ͭͷΨ΢ε෼ ෍ͷࠞ߹ (K = 3) ͔Βੜ੒ͨ͠

    500 ఺Ͱ͋Δɻ) ·ͣɺಉ࣌෼෍ p(x, z) = p(x|z)p(z) ͔Βαϯϓϧ xn ͱ zn Λੜ੒ͨ͠ ͱ͢Δɻ άϥϑ (a) ͸αϯϓϧ xn Λϓϩοτ͠ɺରԠ͢Δ zn ͷ 3 ͭͷঢ়ଶɺ੺ (k = 1)ɺ྘ (k = 2)ɺ੨ (k = 3) Ͱ৭͕ృΒΕ͍ͯΔɻ άϥϑ (b) ͸ੜ੒͞Εͨαϯϓϧ xn ͱ zn ͷ಺ɺzn Λࣺͯͯɺxn Λϓ ϩοτͨ͠ਤͰ͋Δɻ 30 / 67
  31. 2-2. ࠞ߹Ψ΢ε෼෍ ·ͨɺάϥϑ (c) ͸αϯϓϧ xn ͕ಘΒΕͨΒɺ(2.16) ͔ΒରԠ͢Δෛ ୲཰ γk

    (xn ) (k = 1, 2, 3) Λܭࢉ͠ɺରԠ͢Δσʔλ఺ xn Λ γk (xn ) ʹ ൺྫ͢Δྔͷ੺ (k = 1)ɺ྘ (k = 2)ɺ੨ (k = 3) Ͱ৭Λ࢖ͬͯృͬͯ ͍Δɻ ྫ͑͹ɺγ1 (xn ) = 1.0 ͷ఺͸੺ͰృΒΕɺγ2 (xn ) = γ3 (xn ) = 0.5 ͷ఺ ͸྘ͱ੨Λಉྔࠞͥͨ৭ͰృΔɻ άϥϑ (a) ͱ (c) ΑΓɺෛ୲཰͸఺͕ 3 ͭͷΨ΢ε෼෍ͷ಺ͲΕ͔Βੜ ੒͞ΕΔͷ֬཰Λ༩͑Δɻ 31 / 67
  32. 2-2-1. ࠷໬ਪఆ ͦΕͰ͸ɺ؍ଌͨ͠σʔλͷू߹ {x1 , · · · , xN

    } Λࠞ߹Ψ΢ε෼෍ʹ౰ ͯ͸ΊΔͨΊʹɺ࠷໬ਪఆ๏Λ࣮ࢪ͠Α͏ɻ ·ͣɺσʔλͷू߹Λͻͱ·ͱΊʹ N × D ߦྻ X = (x1 , · · · , xN )T Λ ఆٛ͢Δͱɺσʔλ͕ (2.15) ͔Βಠཱʹੜ੒͞Εͨͱ͢Δͱɺ p(X|π, µ, Σ) = N ∏ n=1 [ K ∑ k=1 πk N(xn |µk , Σk ) ] (2.17) ͱͳΔɻ ͜ΕΑΓɺର਺໬౓͸ ln p(X|π, µ, Σ) = N ∑ n=1 ln { K ∑ k=1 πk N(xn |µk , Σk ) } (2.18) ͱͳΔɻ 32 / 67
  33. 2-2-1. ࠷໬ਪఆ ͜͜Ͱɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆͰ͸ɺ୯ҰͷΨ΢ε෼෍ͷ࠷໬ਪఆ Ͱ͸ى͜Βͳ͍ಛ༗ͷ໰୊఺͕͋Δ͜ͱʹ஫ҙ͢΂͖Ͱ͋Δɻ ·ͣɺࠞ߹Ψ΢ε෼෍ͷ֤ཁૉͷڞ෼ࢄߦྻ͕ Σk = σ2 k I

    ͷͱ͖Λߟ ͑Δɻ ͜ͷͱ͖ɺର਺໬౓͸ ln p(X|π, µ, Σ) = N ∑ n=1 ln { K ∑ k=1 πk N(xn |µk , σ2 k I) } (2.19) ͱͳΔɻ ·ͨɺࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉͷฏۉύϥϝʔλ µj ͕ n ݸ໨ͷ σʔλͱ౳͍͠ͱԾఆ͢Δɻ(µj = xn ) 33 / 67
  34. 2-2-1. ࠷໬ਪఆ ͜ͷͱ͖ɺn ݸ໨σʔλ఺͸ର਺໬౓ͷࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁ ૉʹ N(xn |µj ,

    σ2 j I) = N(xn |xn , σ2 j I) = 1 (2π)D/2 1 σD j (2.20) ͱ͍͏ܗͰد༩͢Δɻ ͜Ε͸ σj → 0 Ͱൃࢄ͠ɺର਺໬౓ (2.20) ΋ൃࢄ͢Δɻ ͭ·Γ࠷໬ਪఆͰ͸ର਺໬౓Λ࠷େʹ͢ΔΑ͏ʹύϥϝʔλΛܾΊΔ ͷͰɺ͜ͷࣄ࣮͸ͲΜͳσʔλ఺ͷू߹ {x1 , · · · , xN } ͕༩͑ΒΕͯ΋ɺ ࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉͷฏۉύϥϝʔλ µj Λ n ݸ໨ͷσʔλ xn ͱ౳͘͠͠ɺj ൪໨ͷཁૉͷඪ४ภࠩΛ σj = 0 ͱ͢Ε͹ɺ͋ͱͷύ ϥϝʔλ͸ద౰Ͱ΋ର਺໬౓Λ࠷େʹ͢Δ͜ͱ͕ग़དྷΔɻ ͭ·ΓͲͷΑ͏ͳσʔλ఺ͷू߹͕༩͑ΒΕͯ΋ɺ࠷໬ਪఆͷ݁Ռɺxn ͸ࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉ͔Βੜ੒͞Ε (N(xn |µj , σ2 j I) → ∞ ΑΓ)ɺͦΕҎ֎ͷσʔλ఺͸ࠞ߹Ψ΢ε෼෍ͷͲͷཁૉ͔Βੜ੒͞Ε ͯ΋ྑ͍ (ର਺໬౓Λ࠷େʹ͢Δ) ͱ͍͏ಛҟతͳঢ়گʹ͓͍ͪΔɻ 34 / 67
  35. 2-2-1. ࠷໬ਪఆ ͜Ε͸୯ҰͷΨ΢ε෼෍ͷ࠷໬ਪఆͰ͸ى͜Βͳ͔ͬͨɻ ୯ҰͷΨ΢ε෼෍ͷ໬౓ؔ਺͸ p(X|µ, Σ) = N ∏ n=1

    N(xn |µ, σ2I) (2.21) Ͱ͋Γɺฏۉύϥϝʔλ µ ͕ n ݸ໨ͷσʔλͱ౳͍͠ͱ͢Δ (µ = xn ) ͱɺ p(X|µ, Σ) = N ∏ n=1 N(xn |xn , σ2I) (2.22) ͱͳΔɻ ͜ͷঢ়گͰ σ → 0 ͱ͢Δͱɺ͔֬ʹ (2.22) ͷ n ൪໨ͷҼࢠ͸ൃࢄ͢Δ ((2.20) ΑΓႈ৐ͷൃࢄ) ͕ɺn ൪໨Ҏ֎ͷҼࢠࢦ਺తʹݮਰ͠ 0 ʹऩଋ ͢Δɻ Αͬͯɺσ → 0 ͱͯ͠΋໬౓ؔ਺ (2.21) ൃࢄͤͣɺ࠷খͷ 0 ʹͳΔͷ Ͱɺઌ΄Ͳઆ໌ͨ͠ಛҟతͳঢ়گ͸ࠞ߹Ψ΢ε෼෍ಛ༗ͷ໰୊Ͱ͋Δɻ 35 / 67
  36. 2-2-1. ࠷໬ਪఆ ·ͨɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆղ͸ղੳతʹٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͷ Ͱɺ൓෮๏Λ༻͍ͯਪఆ͢Δ͜ͱΛߟ͑Δɻ ͦͷํ๏ͷҰͭͱͯ͠ɺχϡʔϥϧωοτͰ༻͍ΒΕΔΑ͏ͳޯ഑๏͕ ͋Δɻ ͨͩ͠ɺࠓճ͸ EM ΞϧΰϦζϜͱݺ͹ΕΔ൓෮๏Λ༻͍ͯਪఆ͢Δํ ๏Λߟ͑Δɻ

    ·ͨɺઌ΄Ͳઆ໌ͨࠞ͠߹Ψ΢ε෼෍ಛ༗ͳපతͳ໰୊͸ EM ΞϧΰϦ ζϜΛ༻͍ͯ΋ղܾͰ͖Δ໰୊Ͱ͸ͳ͍͜ͱʹ஫ҙɻ ͳͷͰ͜ͷষͰ͸ɺEM ΞϧΰϦζϜͳͲΛ༻͍ͯ൓෮తʹਪఆ͍ͯ͠ Δͱ͖ʹɺ΋͠ର਺໬౓͕ൃࢄͦ͠͏ͳΒɺͦͷฏۉύϥϝʔλ µj Λ ϥϯμϜʹઃఆ͠ɺΣj ͷ੒෼Λ͋Δఔ౓େ͖͍஋ʹͯ͠൓෮Λ࠶։͢ Δͱ͍͏ൃݟ๏తͳରॲ๏Λߦ͏͘Β͍ʹͱͲΊ͓ͯ͘ɻ 10 ষͰ͸ɺϕΠζతͳΞϓϩʔνΛߦ͏͜ͱʹΑΓɺ͜ͷපతͳ໰୊ ͸ى͜Βͳ͘ͳΔ͜ͱΛΈΔɻ 36 / 67
  37. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆΛ EM ΞϧΰϦζϜΛ༻͍ͯղ͘ ͜ͱΛߟ͑Δɻ ·ͣ͸ɺର਺໬౓ (2.18)

    Λฏۉ µk ͱ෼ࢄ Σk ͱࠞ߹܎਺ πk Ͱඍ෼͠ ͯ 0 ͱͳΔࣜΛॻ͖Լͦ͏ɻ ฏۉ µk Ͱඍ෼͢Δͱ ∂ ∂µk ln p(X|π, µ, Σ) = N ∑ n=1 ∂ ∂µk ln { K ∑ j=1 πj N(xn |µj , Σj ) } = N ∑ n=1 πk ∑ j πj N(xn |µj , Σj ) ∂ ∂µk N(xn |µk , Σk ) = N ∑ n=1 πk N(xn |µk , Σk ) ∑ j πj N(xn |µj , Σj ) Σ−1 k (xn − µk ) (2.23) ͱͳΔɻ(࠷ޙͷΠίʔϧ͸Ψ΢ε෼෍ͷදࣜ (1.12) Λ༻͍ͨ) 37 / 67
  38. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱɺෛ୲཰ (2.16) Λ༻͍Δͱɺର਺໬౓ͷ µk ඍ෼͸ ∂

    ∂µk ln p(X|π, µ, Σ) = N ∑ n=1 γk (xn )Σ−1 k (xn − µk ) =Σ−1 k [ N ∑ n=1 γk (xn )xn − µk N ∑ n=1 γk (xn ) ] (2.24) ͱͳΓɺର਺໬౓ͷ µk ඍ෼Λθϩʹͨࣜ͠͸ µk = 1 Nk N ∑ n=1 γk (xn )xn (2.25) ͱͳΔɻ͜͜Ͱɺ Nk = N ∑ n=1 γk (xn ) (2.26) ͱఆٛͨ͠ɻ 38 / 67
  39. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱɺલʹड़΂ͨෛ୲཰ͷղऍʹΑΓɺ(2.26) ͷ Nk ͸ k ൪໨ͷΫϥ

    ελʔʹׂΓ౰ͯΒΕΔ༗ޮσʔλ਺Λҙຯ͓ͯ͠Γɺ͞Βʹ (2.25) ΑΓ µk ͸σʔλ఺ͷॏΈ෇͖ฏۉͰ͋Γɺෛ୲཰ γk (xn ) ͕େ͖͍ σʔλ xn ΄Ͳ µk ʹେ͖͘د༩͢Δܗʹͳ͍ͬͯΔɻ ࣍ʹର਺໬౓ͷڞ෼ࢄ Σk ඍ෼Λθϩͱ͓͘ͱ Σk = 1 Nk N ∑ n=1 γk (xn )(xn − µk )(xn − µk )T (2.27) ͱͳΓɺฏۉ (2.25) ͱಉ͡ղऍ͕Ͱ͖Δɻ ࠷ޙʹର਺໬౓Λࠞ߹܎਺ πk Ͱඍ෼͢Δ͕ɺπk ͸ن֨Խ৚݅ (2.9) Λ ຬͨ͢΂͖ͳͷͰɺϥάϥϯδϡະఆ৐਺๏Λ༻͍ͯɺ ln p(X|π, µ, Σ) + λ ( K ∑ k=1 πk − 1 ) (2.28) Λ πk ͱະఆ৐਺ λ Ͱඍ෼ͯ͠ 0 ͱ͢Ε͹ྑ͍͜ͱ͕Θ͔Δɻ 39 / 67
  40. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ πk Ͱඍ෼ͨ͠ྔΛ 0 ͱ͓͘ͱ N ∑

    n=1 N(xn |µk , Σk ) ∑ j πj N(xn |µj , Σj ) + λ = 0 → 1 πk N ∑ n=1 γk (xn ) + λ = 0 →πk = − 1 λ N ∑ n=1 γk (xn ) (2.29) ͱͳΔɻ ·ͨɺλ Ͱඍ෼ͯ͠ 0 ͱͨࣜ͠͸ (2.28) ΑΓɺ੍໿৚݅ K ∑ k=1 πk = 1 (2.30) ͱͳΔɻ 40 / 67
  41. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ৚݅ (2.30) ʹ (2.29) Λ୅ೖ͢Δͱ −

    1 λ N ∑ n=1 K ∑ k=1 γk (xn ) = 1 (2.31) ͱͳΓɺ(2.16) ͷෛ୲཰ͷఆٛΑΓɺ K ∑ k=1 γk (xn ) = 1 (2.32) Λຬ͔ͨ͢Βɺ(2.31) ͸ − 1 λ N ∑ n=1 1 = 1 →λ = −N (2.33) ͱͳΔɻ 41 / 67
  42. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ (2.33) Λ༻͍Δͱɺ(2.29) ͸ πk = 1

    N N ∑ n=1 γk (xn ) = Nk N (2.34) ͱͳΓɺπk ͸શσʔλ਺ͷΫϥελʔ k ʹଐ͢Δ༗ޮσʔλ਺ͷׂ߹ ʹͳΔ͜ͱ͕Θ͔ͬͨɻ ͜͜ͰٻΊͨ (2.25) ͱ (2.27) ͱ (2.34) ͸ཅͳղΛ༩͑ͳ͍͜ͱΛ஫ҙ ͢΂͖Ͱ͋Δɻ ͭ·ΓɺͦΕͧΕͷࣜͷӈล͸ෛ୲཰ʹґଘ͍ͯͯ͠ɺ(2.16) ΑΓෛ୲ ཰͸ µk ͱ Σk ͱ πk ͷؔ਺Ͱ͋Δ͔ΒͰ͋Δɻ 42 / 67
  43. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ΑͬͯɺҎԼͷ൓෮తͳख๏ʹΑΓɺର਺໬౓ͷ஋Λେ͖͍ͯ͘͘͠ɻ 1. µk ͱ Σk ͱ

    πk ΛॳظԽ͠ɺର਺໬౓ (2.18) ͷॳظ஋Λܭࢉ͢Δ 2. (E εςοϓ) ݱࡏͷύϥϝʔλΛ༻͍ͯҎԼͷෛ୲཰Λܭࢉ͢Δ γk(xn) = πkN (xn|µk, Σk) K ∑ j=1 πj N (xn|µj , Σj ) (2.35) 3. (M εςοϓ) ݱࡏͷෛ୲཰Λ༻͍ͯɺύϥϝʔλΛ࠶ܭࢉ͢Δ µnew k = 1 Nk N ∑ n=1 γk(xn)xn (2.36) Σnew k = 1 Nk N ∑ n=1 γk(xn)(xn − µnew k )(xn − µnew k )T (2.37) πnew k = Nk N (2.38) ͨͩ͠ɺ Nk = N ∑ n=1 γk(xn) (2.39) 4. ର਺໬౓ ln p(X|πnew, µnew, Σnew) = N ∑ n=1 ln { K ∑ k=1 πnew k N (xn|µnew k , Σnew k ) } (2.40) Λܭࢉ͠ɺมԽྔ͕ᮢ஋ΑΓ΋খ͔ͬͨ͞Βऴྃ͠ɺେ͖͔ͬͨΒεςοϓ 2 ʹ໭Δ 43 / 67
  44. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ࠷ޙʹɺҎલ k-means ΞϧΰϦζϜͷྫͰ༻͍ͨσʔλ (Old Faithful ؒܽઘσʔλ)

    ͱಉ͡σʔλʹରͯ͠ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦ ζϜΛదԠͨ͠ਤΛ঺հ͢Δɻ 44 / 67
  45. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ·ͣɺάϥϑ (a) ͷ྘ͷ఺͸σʔλ఺Λද͠ɺ੨ؙ (k = 1)

    ͱ੺ؙ (k = 2) ͕ࠞ߹Ψ΢ε෼෍ͷͦΕͧΕͷཁૉͱͳΔΨ΢ε෼෍Ͱ͋Γɺ ͦͷΨ΢ε෼෍ͷத৺͕ͦΕͧΕͷฏۉϕΫτϧ µk Ͱ͋ΓɺҎલ k-means ΞϧΰϦζϜͰߦͬͨ࣌ͷϓϩτλΠϓͱಉ͡஋ʹॳظԽ ͨ͠ɻ ·ͨɺͦΕͧΕͷΨ΢ε෼෍ͷڞ෼ࢄ͸౳ํత Σk = σ2 k I ʹॳظԽ ͨ͠ɻ άϥϑ (b) ͸ॳΊͯͷ E εςοϓͷ݁ՌͰ͋Γɺෛ୲཰ͷܭࢉ݁ՌΛද ͍ͯ͠Δɻ ͭ·Γɺ֤σʔλ఺ xn Λ γk (xn ) ʹൺྫ͢Δྔͷ੨ (k = 1) ͱ੺ (k = 2) Ͱ৭Λ࢖ͬͯృ͍ͬͯΔɻ ྫ͑͹ɺγ1 (xn ) = 1.0 ͷ఺͸੺ͰృΒΕɺγ1 (xn ) = γ2 (xn ) = 0.5 ͷ఺ ͸੨ͱ੺Λಉྔࠞͥͨࢵ৭ͰృΔɻ άϥϑ (c) ͸ॳΊͯͷ M εςοϓͷ݁ՌͰ͋Γɺྫ͑͹੨ͷΨ΢ε෼ ෍ͷฏۉ͸֤σʔλ͕੨ͷΫϥελʔʹଐ͢Δ֬཰ (γ1 (xn )) ͰॏΈ෇ ͚ΒΕͨॏ͖෇͖ฏۉʹͳ͍ͬͯΔɻ 45 / 67
  46. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͞Βʹ (d),(e),(c) ͸ͦΕͧΕ EM εςοϓΛ 2

    ճɺ5 ճɺ20 ճߦͬͨޙ ͷෛ୲཰ͱ֤ύϥϝʔλͰ͋Γɺ20 ճߦ͏ͱΑ͘ΫϥελϦϯά͞Ε ͍ͯΔ͜ͱ͕ݟͯͱΕΔɻ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜΛಋೖͨ͠ɻ ࣍ʹҰൠతͳ EM ΞϧΰϦζϜΛಋೖ͢Δɻ 46 / 67
  47. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ ͜Ε·Ͱɺࠞ߹Ψ΢ε෼෍ݶఆͷ EM ΞϧΰϦζϜʹ͍ͭͯઆ໌ͨ͠ ͕ɺ͜͜Ͱ͸Ұൠతͳ EM ΞϧΰϦζϜʹ͍ͭͯಋೖ͢Δɻ ·ͣɺσʔλͷू߹Λͻͱ·ͱΊʹ

    N × D ߦྻ X = (x1 , · · · , xN )T ͱ ͠ɺજࡏม਺΋ͻͱ·ͱΊʹ N × K ߦྻ Z = (z1 , · · · , zN )T ͱ͢Δɻ ͢Δͱɺ؍ଌσʔλͷ X ͷର਺໬౓͸ҎԼͷΑ͏ʹͳΔɻ ln p(X|θ) = ln { ∑ Z p(X, Z|θ) } (2.41) ͜͜Ͱɺθ ͸ϞσϧͷύϥϝʔλͰ͋Γɺࠞ߹Ψ΢ε෼෍ͷ࣌͸ π, µ, Σ ʹରԠ͢Δɻ ࠷໬ਪఆΛߦ͏৔߹ɺln p(X|θ) Λ࠷େʹ͢Δ θ Λ΋ͱΊΔ͕ɺ(2.41) ͷӈลͷ p(X, Z|θ) ͕ࢦ਺ܕ෼෍଒Ͱ͋ͬͯ΋ɺ࿨ ∑ Z Λͱͬͯ ln Λ ͱΔ͜ͱͰɺln p(X|θ) ͸ࢦ਺ܕ෼෍଒Ͱͳ͘ҰൠతʹෳࡶͳܗʹͳΔɻ Αͬͯɺ࠷໬ղ΋ෳࡶʹͳΔɻ 47 / 67
  48. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ ͜͜Ͱɺ΋͠؍ଌʹΑͬͯ X ͚ͩͰ͸ͳ͘ɺZ ΋ಘΒΕΔͱ͢Δɻ ͜ͷͱ͖ {X, Z}

    Λ׬શσʔλू߹ͱ͍͍ɺX Λෆ׬શσʔλू߹ͱ ͍͏ɻ ΋͠ {X, Z} ͕༩͑ΒΕͨΒɺ׬શσʔλ໬౓ؔ਺ p(X, Z|θ) Λ࠷େʹ ͢Δύϥϝʔλ θ ΛٻΊΔ͜ͱ͕Ͱ͖Δɻ(͜͜ͱ͜ΕҎ߱Ͱ ln p(X, Z|θ) ͸؆୯ʹ࠷େԽͰ͖Δͱ͢Δɻ) ͨͩ͠ɺ؍ଌʹΑΓ Z ͕ಘΒΕͳ͍͜ͱ͕΄ͱΜͲͰ͋Δɻ ΑͬͯɺZ ͕Θ͔Βͳ͍ͷͰɺln p(X, Z|θ) Λ࠷େʹ͢Δ͜ͱ͸Ͱ͖ͣɺ ୅ΘΓʹ ln p(X, Z|θ) ͷ Z ͷࣄޙ෼෍ p(Z|X, θ) ʹ͓͚Δظ଴஋Λར༻ ͢Δ͜ͱΛߟ͑Δɻ ͳͥɺࣄޙ෼෍ p(Z|X, θ) Λར༻͢Δͷ͔ʹ͍ͭͯ͸ɺ2-4 Ͱઆ໌͢Δɻ 48 / 67
  49. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ X ͸؍ଌʹΑΓ༩͑ΒΕ͍ͯͯɺp(X, Z|θ) ͷܗ΋Θ͔͍ͬͯΔͱ͖ɺ p(X|θ) Λ࠷େʹ͢Δ θ

    ΛٻΊΔ EM ΞϧΰϦζϜ͸ҎԼͰ͋Δɻ 1. ύϥϝʔλ θold ΛॳظԽ͢Δ 2. (E εςοϓ) ࣄޙ෼෍ p(Z|X, θold) Λܭࢉ͢Δ 3. (M εςοϓ) ࣍ࣜͰ༩͑ΒΕΔ θnew ΛٻΊΔ θnew = argmaxθ Q(θ, θold) (2.42) ͨͩ͠ɺ Q(θ, θold) = ∑ Z p(Z|X, θold) ln p(X, Z|θ) (2.43) 4. ໬౓ؔ਺ p(X|θ) ͷ஋͕ऩଋ৚݅Λຬ͍ͨͯ͠Δ͔Ͳ͏͔Λௐ΂ɺຬͨ͞Ε͍ͯͳ ͚Ε͹ɺ θold ← θnew (2.44) ͱ͠ɺεςοϓ 2 ʹ໭Δ 49 / 67
  50. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͦΕͰ͸ɺࠓઆ໌ͨ͠Ұൠతͳ EM ΞϧΰϦζϜΛࠞ߹Ψ΢ε෼෍ͷ৔ ߹ʹద༻ͯ͠ɺͦͷ݁Ռ͕ 2-2-2 ͱ౳͍͜͠ͱΛΈΔɻ ·ͣ׬શσʔλ໬౓ؔ਺͸ (2.11)

    ͱ (2.14) ΑΓ p(X, Z|π, µ, Σ) = N ∏ n=1 K ∏ k=1 πznk k N(xn |µk , Σk )znk (2.45) ͱͳΔɻ ͜͜Ͱɺznk ͸ zn ͷ k ൪໨ͷཁૉͰ͋Δɻ ֬཰ͷ৐๏ఆཧͱՃ๏ఆཧΑΓɺࣄޙ෼෍ p(Z|X, π, µ, Σ) ͸ p(Z|X, π, µ, Σ) = p(X, Z|π, µ, Σ) p(X|π, µ, Σ) = p(X, Z|π, µ, Σ) ∑ Z′ p(X, Z′|π, µ, Σ) (2.46) ͱͳΔɻ 50 / 67
  51. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͦΕͰ͸ɺEM ΞϧΰϦζϜͷεςοϓ 1 ͱͯ͠ɺύϥϝʔλ π, µ, Σ Λ

    πold, µold, Σold ॳظԽ͢Δɻ ͜͜Ͱɺεςοϓ 2 ʹೖΔલʹɺθ = {π, µ, Σ} ͱ͢ΔͱɺQ(θ, θold) ͸ Q(θ, θold) = ∑ Z p(Z|X, πold, µold, Σold) ln p(X, Z|π, µ, Σ) = 1 ∑ Z′ p(X, Z′|πold, µold, Σold) × ∑ Z p(X, Z|πold, µold, Σold) ln p(X, Z|π, µ, Σ) (2.47) ͱͳΔɻ 51 / 67
  52. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ (2.45) ΑΓɺ(2.47) ͸ Q(θ, θold) = N ∑

    n=1 K ∑ k=1 [∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) ] ln { πk N(xn |µk , Σk ) } (2.48) ͱͳΔɻ ͜͜Ͱɺ·ͨ (2.45) ΑΓɺ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = ∑ Z znk N ∏ m=1 K ∏ k′=1 [πold k′ N(xm |µold k′ , Σold k′ )]zmk′ ∑ Z′ N ∏ l=1 K ∏ k′′=1 [πold k′′ N(xl |µold k′′ , Σold k′′ )]z′ lk′′ (2.49) ͱͳΔɻ 52 / 67
  53. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͜͜Ͱɺ ∑ Z = ∑ z1 · ·

    · ∑ zN Λ༻͍Δͱɺ(2.49) ͸ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = ∑ zn znk K ∏ k′=1 [πold k′ N(xn |µold k′ , Σold k′ )]znk′ ∑ z′ n K ∏ k′′=1 [πold k′′ N(xn |µold k′′ , Σold k′′ )]z′ nk′′ (2.50) ͱͳΔɻ(n ൪໨ͷҼࢠҎ֎͸෼฼෼ࢠͰΩϟϯηϧ͞ΕΔ) (2.50) ͷ෼ࢠ͸ ∑ zn ͷதͰ k ੒෼໨͕ znk = 1 ͱͳΔ߲ͷΈ࢒ΔͷͰɺ ∑ zn znk K ∏ k′=1 [πold k′ N(xn |µold k′ , Σold k′ )]znk′ = πold k N(xn |µold k , Σold k ) (2.51) ͱͳΔɻ 53 / 67
  54. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ Ұํɺ(2.50) ͷ෼฼͸ɺ ∑ zn ͷ߹ܭ K ݸͷ࿨͕શͯ࢒ΔͷͰɺ ∑

    z′ n K ∏ k′′=1 [πold k′′ N(xn |µold k′′ , Σold k′′ )]z′ nk′′ = K ∑ j=1 πold j N(xn |µold j , Σold j ) (2.52) ͱͳΔɻ (2.51) ͱ (2.52) ΑΓ (2.49) ͸ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = πold k N(xn |µold k , Σold k ) K ∑ j=1 πold j N(xn |µold j , Σold j ) =γk (xn , πold, µold, Σold) (2.53) ͱͳΓɺෛ୲཰͕ग़ͯ͘Δɻ((2.16) Λ࢖༻ͨ͠) ͜͜ʹग़ͯ͘Δෛ୲཰͸ݱࡏͷύϥϝʔλ θold ͷෛ୲཰Ͱ͋Δ͜ͱʹ ஫ҙɻ 54 / 67
  55. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͭ·Γɺεςοϓ 2(E εςοϓ) Ͱࣄޙ෼෍Λܭࢉ͢Δͱݴ͏͜ͱ͸ɺ ࠞ߹Ψ΢ε෼෍Ͱ͸ෛ୲཰ γk (xn ,

    πold, µold, Σold) Λܭࢉ͢Δͱ͍͏͜ ͱʹͳΔɻ ෛ୲཰Λܭࢉͨ͠ΒɺQ(θ, θold) ͸ (2.49) ΑΓ Q(θ, θold) = N ∑ n=1 K ∑ k=1 γk (xn , πold, µold, Σold) ln { πk N(xn |µk , Σk ) } (2.54) ͱͳΓɺεςοϓ 3(M εςοϓ) Ͱ͸ɺQ(θ, θold) Λ࠷େʹ͢Δ θnew = {πnew, µnew, Σnew} Λݟ͚ͭΔɻ ࣮ࡍʹ Q(θ, θold) Λ π, µ, Σ Ͱ࠷େʹ͢ΔࣜΛಋग़͢ΔͱɺҎલಋग़͠ ͨ (2.25) ͱ (2.27) ͱ (2.34) ͱಉ͡ܗͷ͕ࣜग़ͯདྷΔ͜ͱ͕Θ͔Δɻ (PRML ԋश 9.8) 55 / 67
  56. 2-3-2. k-means ͱͷؔ࿈ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜͷಛघͳ৔߹͕ k-means ΞϧΰϦζϜͰ͋Δ͜ͱΛݟΔɻ ͦͷͨΊʹ·ͣɺҎԼͷΑ͏ʹ (2.13)

    ͷ৚݅෇͖֬཰Ͱɺڞ෼ࢄߦྻ Λ Σk = ϵI ͱ͢Δɻ p(x|zk = 1) = N(x|µk , ϵI) (k = 1, · · · , K) (2.55) ͜ͷͱ͖ɺσʔλ఺ xn ͷෛ୲཰͸ (2.16) ΑΓ γk (xn ) = πk N(xn |µk , ϵI) K ∑ j=1 πj N(xn |µj , ϵI) = πk exp { − ∥xn − µk ∥2/2ϵ } K ∑ j=1 πj exp { − ∥xn − µj ∥2/2ϵ } (2.56) ͱͳΔɻ 56 / 67
  57. 2-3-2. k-means ͱͷؔ࿈ ͜͜Ͱɺ∥xn − µj ∥2 Λ࠷খʹ͢Δ j Λ

    j∗ ͱ͢Δɻ(·ͨɺ࠷খͰ͋Δ j ͸ j∗ །ҰͰ͋ΔͱԾఆ͢Δ) ͦͯ͠ɺ(2.56) ͷ࠶ӈลͷ෼฼෼ࢠʹ exp { ∥xn − µj∗ ∥2/2ϵ } Λ͔͚ Δͱɺ γk (xn ) = πk exp { − (∥xn − µk ∥2 − ∥xn − µj∗ ∥2)/2ϵ } K ∑ j=1 πj exp { − (∥xn − µj ∥2 − ∥xn − µj∗ ∥2)/2ϵ } (2.57) ͱͳΓɺk(̸= j∗) Ͱ͸ɺ−(∥xn − µk ∥2 − ∥xn − µj∗ ∥2) < 0 ΑΓɺ γk(̸=j∗) (xn ) → 0 (ϵ → 0) (2.58) ͱͳΔɻ 57 / 67
  58. 2-3-2. k-means ͱͷؔ࿈ Ұํɺk = j∗ Ͱ͸ɺ−(∥xn − µk ∥2

    − ∥xn − µj∗ ∥2) = 0 ͱͳΔɻ ·ͨɺ(2.57) ͷ෼฼ͷ࿨ ∑ K j=1 ͷதͰ j ̸= j∗ ͱͳΔ߲͸ ϵ → 0 Ͱ 0 ʹ ऩଋ͢ΔͷͰ γk=j∗ (xn ) → πj∗ πj∗ (ϵ → 0) = 1 (2.59) ͱͳΔɻ ͭ·Γɺ͜ͷۃݶͰ͸ෛ୲཰ γk (xn ) ͸ 2-1 Ͱಋೖͨ͠ 2 ஋ม਺ rnk ͱ Ұக͢Δɻ 58 / 67
  59. 2-3-2. k-means ͱͷؔ࿈ ·ͨɺ(2.54) ͸͜ͷۃݶͰ͸ Q(θ) = N ∑ n=1

    K ∑ k=1 rnk ln { πk N(xn |µk , ϵI) } = − 1 2 N ∑ n=1 K ∑ k=1 rnk ∥xn − µk ∥2 + const. (2.60) ͱͳΓɺ(2.1) ͷ࿪Έई౓͕ొ৔͠ɺQ(θ) Λ rnk ͱ µk Ͱ࠷େʹ͢Δ͜ ͱ͸ɺ(2.1) ͷ࿪Έई౓Λ rnk ͱ µk Ͱ࠷খʹ͢Δ͜ͱͱ౳ՁͰ͋Δ͜ ͱ͕Θ͔Δɻ 59 / 67
  60. 2-4. Ұൠͷ EM ΞϧΰϦζϜ 2-3 ͰҰൠతͳ EM ΞϧΰϦζϜΛಋೖͨ͠ͱ͖ɺҰൠతʹ؍ଌͰ͸જ ࡏม਺͸ಘΒΕͳ͍ͷͰɺ׬શσʔλͷର਺໬౓ؔ਺ͷજࡏม਺ͷࣄޙ ෼෍

    Q(θ, θold) Λ࠷େʹ͢ΔύϥϝʔλΛ൓෮తʹٻΊΔ͜ͱͰɺෆ׬ શσʔλͷ໬౓ؔ਺ p(X|θ) Λ࠷େʹͨ͠ɻ ͜͜Ͱ͸ɺQ(θ, θold) Λ࠷େʹ͢ΔύϥϝʔλΛ൓෮తʹٻΊΔ͜ͱͰ ෆ׬શσʔλͷ໬౓ؔ਺ p(X|θ) Λ࠷େʹग़དྷΔ͜ͱͷূ໌Λߦ͏ɻ 60 / 67
  61. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ·ͣɺZ ͷ೚ҙͷ֬཰෼෍ؔ਺ q(Z) Λಋೖ͢Δͱɺෆ׬શσʔλͷର ਺໬౓ؔ਺ ln

    p(X|θ) ͸ҎԼͷΑ͏ʹͳΔɻ ln p(X|θ) = { ∑ Z q(Z) } =1 ln p(X|θ) = ∑ Z q(Z) ln { p(X, Z|θ) p(Z|X, θ) } = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) · q(Z) p(Z|X, θ) } = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) } − ∑ Z q(Z) ln { p(Z|X, θ) q(Z) } (2.61) 61 / 67
  62. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ͜͜Ͱɺ৐๏ఆཧ p(X, Z|θ) = p(Z|X, θ)p(X|θ)

    (2.62) Λ༻͍ͨɻ ·ͨɺ L(q, θ) = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) } (2.63) KL(q∥p) = − ∑ Z q(Z) ln { p(Z|X, θ) q(Z) } (2.64) ͱ͢Ε͹ɺ(2.61) ͸ ln p(X|θ) = L(q, θ) + KL(q∥p) (2.65) ͱͳΔɻ 62 / 67
  63. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ͜͜ͰɺPRML ͷ 1.6.1 ΑΓɺKL μΠόʔδΣϯε KL(q∥p)

    ͸ඇෛͰ ͋Δɻ(KL(q∥p) ≥ 0) ౳߸͸ q(Z) = p(Z|X, θ) ͷͱ͖ͷΈ੒ཱ͢Δɻ Ҏ্ΑΓɺln p(X|θ) ≥ L(q, θ) ͱͳΓɺL(q, θ) ͸ ln p(X|θ) ͷԼݶͰ͋ Δɻ(Լਤ) 63 / 67
  64. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ෼ղ (2.65) Λར༻ͯ͠ EM ΞϧΰϦζϜΛఆٛ͠ɺͦΕ͕ର਺໬౓ ln

    p(X|θ) Λ࠷େԽ͢Δ͜ͱΛࣔ͢ɻ ·ͣɺݱࡏͷύϥϝʔλͷ஋͕ θold Ͱ͋Δͱ͢Δɻ E εςοϓͰ͸ɺL(q, θold) Λ θold Λݻఆ͠ͳ͕Β q(Z) ʹ͍ͭͯ࠷େ Խ͢Δɻ ͜ͷ࠷େԽ͸؆୯Ͱɺର਺໬౓ ln p(X|θold) ͸ q(Z) ʹґଘ͠ͳ͍ͷͰɺ q(Z) ΛมԽͤͯ͞΋ ln p(X|θold) ͸มԽ͠ͳ͍ɻ ͜ΕΑΓɺKL(q∥p) = 0 ͭ·Γ q(Z) = p(Z|X, θold) ͷͱ͖ɺL(q, θold) ͸࠷େʹͳΔɻ ͜ͷͱ͖ɺԼਤͷΑ͏ʹ ln p(X|θold) = L(q, θold) ͱͳΔɻ 64 / 67
  65. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ࣍ͷ M εςοϓͰ͸ɺq(Z) = p(Z|X, θold)

    Ͱݻఆ͠ɺL(q, θ) Λ θ ʹͭ ͍ͯ࠷େԽ͢Δɻ L(q, θ) Λ࠷େʹ͢ΔύϥϝʔλΛ θnew ͱ͢Δͱɺ͢Ͱʹ L(q, θ) ͕࠷ େ஋ʹୡ͍ͯ͠ͳ͍ݶΓ (θnew ̸= θold ͷͱ͖)ɺL(q, θ) ͸૿Ճ͠ɺ͞Β ʹ KL μΠόʔδΣϯε͸ҎԼͷΑ͏ʹਖ਼ʹͳΔɻ KL(q∥p) = − ∑ Z p(Z|X, θold) ln { p(Z|X, θnew) p(Z|X, θold) } > 0 (2.66) 65 / 67
  66. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ΑͬͯɺҎԼͷਤͷΑ͏ʹର਺໬౓ ln p(X|θ) Λ૿Ճͤ͞Δɻ 66 /

    67
  67. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ·ͨɺq(Z) = p(Z|X, θold) ͷͱ͖ͷ L(q,

    θ) ͸ L(q, θ) = ∑ Z p(Z|X, θold) ln { p(X, Z|θ) p(Z|X, θold) } = ∑ Z p(Z|X, θold) ln p(X, Z|θ) − ∑ Z p(Z|X, θold) ln p(Z|X, θold) =Q(θ, θold) + const. (2.67) ͜͜Ͱɺ(2.43) ͷ Q(θ, θold) Λ༻͍ͨɻ ͜͜Ͱͷ const. ͱ͸ɺθ ʹґଘ͠ͳ͍߲Ͱ͋Δɻ ͜ΕΑΓɺM εςοϓͷ L(q, θ) ͷ࠷େԽ͸ Q(θ, θold) ͷ࠷େԽͱ౳Ձ Ͱ͋Δɻ ͜Ε͕ 2-3 ͷ M εςοϓͰɺln p(X, Z|θ) ͷࣄޙ෼෍ p(Z|X, θ) ʹ͓͚ Δظ଴஋ Q(θ, θold) Λ࠷େԽ͢Δཧ༝Ͱ͋Δɻ 67 / 67