Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRML9章解説

carushi
June 27, 2017

 PRML9章解説

以前輪講で使った9章の解説です。もったいないのであげておきますが、間違いがある可能性が大いにありますのでご参考までに。図はビショップ先生のWebsite (http://microsoft.com/en-us/research/people/cmbishop/)から

carushi

June 27, 2017
Tweet

More Decks by carushi

Other Decks in Science

Transcript

  1. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ࠓճͷ෭୊ɿBeamer ͱ༡΅͏ʂ Beamer ˎ

    2005 ೥ࠒʹ Tantau ͞ΜʹΑΓ࡞ΒΕͨ LaTex ʹجͮ͘ϓϨ θϯςʔγϣϯιϑτ΢ΣΞ ˎ ࠷ऴߋ৽͸ 2012 ೥ 5 ݄ ˎ ҰจͣͭจࣈΛදࣔ͢ΔͳͲͷϓϨθϯػೳͱɺͦΕΛ഑෍ ࢿྉʹ͢Δͷ͕؆୯ ˎ σβΠϯʹͲ͍͗ͭ৭࢖͍ͷ΋ͷ͕ଟ͍ͷ͕ͪΐ ͬͱ... ˎ ” You can use Beamer both with pdflatex and latex+dvips. The standard commands of LaTeX still work.” ˎ ࠓ·Ͱ਺ࣜͩΒ͚ͷεϥΠυΛ࡞Δػձ͕ͳ͔ͬͨͷͰ...
  2. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ·͕͖͑ ˎ જࡏม਺Λಋೖ ˠ

    σʔλΛѻ͍΍͘͢׌ͭղऍ͠΍͘͢ ˠ पล෼෍ʢෳࡶʣ ˠ ؍ଌม਺ͱજࡏม਺ͷಉ࣌෼෍ʢѻ͍΍͍͢ʣ ˎ ࠓճ͸཭ࢄજࡏม਺ʹ͍ͭͯ ˎ ؍ଌͰ͖ͳ͍ม਺ͷ͜ͱ ˎ ࿈ଓ͸ 12 ষͰ ˎ ඇ֬཰తΞϓϩʔνɿK-means ˎ ֬཰తΞϓϩʔνɿજࡏม਺Λ༻͍ͨࠞ߹෼෍ ˎ ࠷໬ղΛಘΔͨΊʹ EM Λ༻͍Δ ˎ EM ΞϧΰϦζϜͷඇ֬཰తۃݶ͕ K-means ˎ ͦͯ͠࠷໬๏ΑΓม෼ϕΠζͱݴͬͯ 9 ষ΁ͷ΍ΔؾΛ࡟͙
  3. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.1 K-means ΫϥελϦϯά ଟ࣍ݩ্ۭؒͷσʔλ఺ू߹ΛΫϥελʔʹ෼ྨ͢Δ໰୊

    ˎ ಺෦ͷσʔλ఺ؒͷڑ཭͕֎෦ͷ఺ͱͷڑ཭ ͱൺ΂ͯখ͍͞ (i) −2 0 2 −2 0 2 ˎ Ϋϥελʔͷ਺͸ K ݸͱ͠·͢ ˎ σʔλ఺ͱಉ͡ D ࣍ݩͷϕΫτϧ µk ΛΫϥελʔத৺ ˎ σʔλ xn ͕ k ൪໨ͷΫϥελʔʹଐ͢Δͱ͖ rnk = 1, rnj = 0 (j ̸= k) µk ͱ͍ۙ఺͕ k ʹ෼ྨ͞ΕΔΑ͏໨తؔ਺Λఆٛ   ࿪Έई౓ (distortion measure) J = ∑ N n=1 ∑ K k=1 rnk∥xn − µk∥2  
  4. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... J ͸ xn ͕ଐ͍ͯ͠ΔΫϥελʔ

    k ͷத৺ µk ͱͷڑ཭ͷೋ৐Λ͢΂ͯͷσʔλ఺ʹରͯ͠ ଍͠߹Θͤͨ΋ͷ ˠ͜ΕΛ࠷খԽ͢Δ µk ͱ rnk Λݟ͚͍ͭͨ ˎ µk, rnk ͱ͍͏ 2 ͭͷ࠷దԽ͢΂͖ཁૉ͕͋Δ ˎ 1 ͭΛݻఆͨ͠ͱ͖ J Λ࠷খԽ͢Δ஋ʹߋ৽͢Δͱ͍͏खॱΛ܁Γฦ͢ ˎ ͜ΕΛ EM ΞϧΰϦζϜ E(expectation) step, M(maximization) step ʹର Ԡͤ͞Δʢ֬཰ϞσϧͰ͸ͳ͍ͷͰ EM Ͱ͸ͳ͍ʣ ˎ ࢵͷઢ͸ਨ௚ೋ౳෼ઢ=ೋ఺͔Βͷڑ཭͕౳͍͠ (a) −2 0 2 −2 0 2 (a) ॳظ஋ (b) −2 0 2 −2 0 2 (b) 1st E (c) −2 0 2 −2 0 2 (c) 1st M (d) −2 0 2 −2 0 2 (d) 2nd E (e) −2 0 2 −2 0 2 (e) 2nd M
  5. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 1 µk ʹ͸ద౰ͳॳظ஋Λ༩͑Δ 2

    µk Λݻఆ͠ɺJ Λ࠷খԽ͢Δ rnk ΛಘΔ rnk = { 1 when k = argminj ∥xn − µj ∥2 0 others (1) 3 rnk Λݻఆ͠ɺJ Λ࠷খԽ͢Δ µk ΛಘΔ ∂J ∂µk = −2 N ∑ n=1 rnk (xn − µk ) = 0 (2) µk = ∑ N n=1 rnk xn ∑ N n=1 rnk ʢx ͷฏۉ ˠ K-meansʣ(3)
  6. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ͍ͩͿ͔͋Μॳظ஋Ͱ͢ͷͰɺ̎ճ͸൓෮͕ඞཁ Ͱͨ͠ɻҰൠతʹ͸ϥϯμϜʹબΜͩσʔλ఺Λ ॳظ஋ʹ͢Δ͚ͩͰ΋ɺ΋ͬͱྑ͍݁Ռ͕ಘΒΕ Δͱߟ͑ΒΕ·͢ɻ

    J 1 2 3 4 0 500 1000 ԋश 9.1. ্ड़ͷΞϧΰϦζϜ͕༗ݶճͰऩଋ͢Δ͜ͱΛূ໌͢Δɻ ·֤ͣσʔλ͸ K ݸͷΫϥελʔͷͲΕ͔ʹ෼ྨ͞ΕΔͷͰɺrnk ͸ NK ௨Γͷύλʔϯ͔͠ଘࡏ͠ͳ͍ɻ͔ͭεςοϓ͝ͱʹ J ͸খ͘͞ ͳ͍͖ͬͯɺJ ͷ஋͕มΘΒͳ͔ͬͨࡍʹ͸ఀࢭ͢ΔͷͰɺ͜ͷΞϧΰ ϦζϜ͸༗ݶճͰऩଋ͢Δɻ ʢ͔͠͠ہॴղʹऩଋ͢ΔՄೳੑ͕͋Δɻ ʣ
  7. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ˎ ઌड़ͷख๏͸͕͔͔࣌ؒΔ — rnk

    ΛܾΊΔࡍͷશ૊Έ߹ΘͤͷϢʔΫϦουڑ཭ͷܭࢉ ˎ όονͰ͸ͳ͘ΦϯϥΠϯख๏Λಋೖ — Robbins-Monro ๏ ԋश 9.2. σʔλ఺ xn ͸࠷΋͍ۙ µk Λ࣋ͭΫϥελʔ k ʹଐ͢ΔͷͰ ∂J ∂µk = −2rnk (xn − µk ) = z(µk ) (4) Robbins-Monro ͷ z(θN−1) ʹˢΛ୅ೖ͠ µnew k = µold k − an (−2 × 1(xn − µold k )) (5) + ηn (xn − µold k ) (6)
  8. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ߴ଎Խख๏ ˎ Moore, 2000.

    The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data. ΧʔωΪʔϝϩϯେֶڭतɺωοτ্ʹ K-means Λղઆ͍ͯ͠ΔεϥΠ υ͕མ͍ͪͯ·͢ த৺఺ͱ൒ܘ৘ใʢͱσʔλͷ࠷େอ੍࣋ݶʣΛ࣋ͭϊʔυʹΑΓߏ੒͞ ΕΔ Metric-tree Λσʔλߏ଄ʹ༻͍Δ͜ͱͰɺڑ཭Λܭࢉ͢΂͖఺Λޮ ཰Α͘୳ࡧͰ͖Δ ˎ Elkan, 2003. Using the Triangle Inequality to Accelerate k-means. ॳظ஋ɺڑ཭ؔ਺ʹΑΒͣಉ͡ܭࢉͰಉҰղΛอূͯ͠ߴ଎Խ σʔλ఺͔ΒͷΫϥελʔத৺ͷڑ཭ͷ্քɺԼքɺΫϥελʔؒͷڑ཭ ͸ܭࢉ or อ࣋ɻࡾ֯ෆ౳ࣜͰڑ཭͕ΑΓۙ͘ͳΓ͏Δσʔλ఺ͷΈܭࢉ
  9. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... K-medoids ΞϧΰϦζϜ ϢʔΫϦουڑ཭͚ͩͰͳ͘ɺҰൠԽ͞Εͨඇྨࣅ౓Λಋೖ͠ɺ ࿪Έई౓Λఆٛ

    ˜ J = N ∑ n=1 K ∑ k=1 rnkν(xn, µk) (7) ˎ N ݸͷσʔλ఺Λ K ݸͷΫϥελʔͷͲΕ͔ʹׂΓ౰ͯΔࡍ ͷܭࢉྔ O(NK) ˎ ͜ͷ৔߹ µk ΛܾΊΔࡍʹ୯ͳΔฏۉʹͰ͖ͳ͍ͨΊɺׂΓ ౰ͯΒΕͨσʔλͷͲΕ͔ΛϓϩτλΠϓͱͨ͠Γ ˎ ͦ΋ͦ΋ medoidshift ͸σʔλ఺ΛΫϥελʔத৺ʹ͢Δख๏ͰɺΫϥελʔؒͷ ڑ཭ͳͲΛ࠷ॳʹ͢΂ͯܭࢉ͓͚ͯ͠͹࠶ܭࢉͤͣߴ଎ʹͰ͖Δͱ͍͏΋ͷ... ˎ ͜ͷׂΓ౰ͯͷࡍʹ͸ O(N2 k ) ૬౰ͷܭࢉྔ ˎ ҰͭͷΫϥελʔʹׂΓ౰ͯʢϋʔυׂΓ౰ͯʣ ˱ෳ਺ͷΫϥελʔʹ֬཰తʹׂΓ౰ͯʢιϑτׂΓ౰ͯ ʣ
  10. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.1.1 ը૾෼ׂͱը૾ѹॖ ˎ K-means

    ΞϧΰϦζϜͷ࣮Ԡ༻ྫͱͯ͠ը૾ ˎ σʔλܗࣜˠ੺ɾ੨ɾ྘ͷ 3 ݪ৭͕ [0,1] ͷ۠ؒ ˎ K ݸͷΫϥελʔʹ෼ྨ͠ɺͦͷΫϥελʔத৺ µk ͷ৭Ͱॻ͖௚͢ K = 2 (f) K=2 K = 3 (g) K=3 K = 10 (h) K=10 Original image (i) ݪը૾
  11. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ˎ lossless data compression

    ྫ ϋϑϚϯූ߸Խ — DAEBCBACBBBC — B = 0, C = 10, A = 110, D = 1110, E = 1111 ˎ lossy data compression — K-means Λ༻͍ͯ K ৭ʹม׵͢Δ͜ͱͰѹॖ — vector quantization ʹΑΓ code-book vector ˎ ͲΕ͙Β͍ѹॖͰ͖͍ͯΔͷ͔ʁ — N=43,200 ͷͱ͖ — {R,G,B} ֤ 8 Ϗοτʷ N ݸͷըૉ 24N (1,036,800) — K ݸͷ୅දϕΫτϧͰѹॖ͢Δͱ 24K + N log2 K (K=2, 3, 10 Ͱ 43,248, 86,472, 173,040) — K=20000 ͙Β͍Ͱ͍͍ͩͨ 110 ສ͙Β͍Ͱ͢Ͷ ˎ RGB ஋Λݟ͍ͯΔ͚ͩͰ্ۭؒͰͷۙ͞͸ݟ͍ͯͳ͍ ˎ ࣮ࡍͷը૾ѹॖͰ͸ۙྡϒϩοΫͷ RGB ͕૬ؔ͢Δ͜ͱΛར༻ͯ͠ѹॖΛߦ͏΋ ͷ͕ଟ͍ ˎ ྫ͑͹ JPEG ͳͲ͸ۙ͘ͷϒϩοΫΛάϧʔϓʹ͠ɺ཭ࢄίαΠϯม׵ͯ͠ߴप೾ ෦෼Λ੾Γࣺͯͯѹॖ͠ɺͦͷ͋ͱϋϑϚϯූ߸Խͯ͑͠Μ΍͜Β
  12. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.2 ࠞ߹Ψ΢ε෼෍ ˎ ୯Ұͷ෼෍ΑΓ΋๛͔ͳදݱྗΛ࣋ͭࠞ߹Ψ΢ε෼෍ʹ཭ࢄతͳજࡏม਺Λಋೖ

    ˎ ࠞ߹Ψ΢ε෼෍͸ҎԼͷΑ͏ʹදͤΔ p(x) = K ∑ k=1 πkN (x|µk, Σk) ͨͩ͠ πk ͸ K ∑ k=1 πk = 1, 0 ≥ πk ≥ 1 ˎ K ࣍ݩͷ 2 ஋֬཰ม਺ z ΛಋೖʢK-means ͷࡍͷ֤ΫϥελʔʹରԠʣ (z ͷपล෼෍ɺ৚݅෇͖෼෍ɺಉ࣌෼෍) p(zk = 1) = πk, p(z) = K ∏ k=1 πzk k p(x|zk = 1) = N (x|µk, Σk) p(x) = ∑ z p(z)p(x|z) = ∑ z K ∏ k (πkN(x|µk, Σk))zk = ∑ k πkN(x|µk, Σk)
  13. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ˎ જࡏม਺ΛؚΉผͳදݱΛཅʹݟग़͠·ͨ͠ɻ͜ΕʹΑΓ p(x,z) Λ࢖ͬͯٞ࿦Ͱ͖·͢ɻ

    ˎ ෛ୲཰ γ(zk) ˎ x ͕༩͑ΒΕͨ΋ͱͰͷ z ͷ৚݅෇͖֬཰ɺͲΕ͚ͩ x Λʮઆ ໌͍ͯ͠Δ͔ʯσʔλ x ͕ग़ྗ͞Εͨͱ͖ͲΕ͚ͩͷ֬཰Ͱ zk = 1 ͷ৚͔݅Βग़ྗ͞Ε͍ͯΔ͔ʁ γ(zk) ≡ p(zk = 1|x) = p(zk = 1)p(x|zk = 1) ∑ K j=1 p(zj = 1)p(x|zj = 1) (8) = πkN(x|µk, Σk) ∑ j πjN(x|µj, Σj) (9) (a) 0 0.5 1 0 0.5 1 ࠞ߹෼෍͔Βग़ྗ (b) 0 0.5 1 0 0.5 1 ෆ׬શͳσʔλू߹ (c) 0 0.5 1 0 0.5 1 ෛ୲཰ͷՄࢹԽ
  14. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.2.1 ࠷໬ਪఆ ֤σʔλ఺͕ K

    ݸͷΨ΢ε෼෍͔Βಠཱʹੜ੒͞ΕΔͱԾఆ    xT 1 . . . xT N         N columns D rows · · · X    zT 1 . . . zT N         K columns D rows · · · Z ର਺໬౓ؔ਺͸ ln p(X|π, µ, Σ) = N ∑ n=1 ln { K ∑ k=1 πkN(xn|µk, Σk) } ͜ͷؔ਺Λ࠷େԽ͢Δͷ͸ෆྑઃఆ໰୊ — ྑઃఆ໰୊ ˎ ղ͕ଘࡏ ˎ ղ͕Ұҙʹܾ·Δ ˎ ύϥϝʔλʹରͯ͠࿈ଓతʹมԽ
  15. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ಛҟϞσϧͱਖ਼ଇϞσϧ ˎ ྑઃఆ໰୊ˠਖ਼ଇϞσϧ ˎ

    ਖ਼ن෼෍΍ೋ߲෼෍ͳͲ͸ਖ਼ଇϞσϧ ˎ ϑΟ ογϟʔ৘ใߦྻ (= E [ ( ∂ ∂θ ln L(θ|x))2|θ ] ର਺໬౓Λύ ϥϝʔλͰඍ෼ͨ͠΋ͷͷೋ࣍ͷϞʔϝϯτ) ͕ਖ਼ఆ஋Λ࣋ ͭʢඞཁे෼ʣ ˎ The Cramer-Rao Inequality · · · ෆภਪఆྔͷ෼ࢄ͕ϑΟ ο γϟʔ৘ใྔͷ inverse Ҏ্ʹͳΔ var(ˆ θ) ≥ 1 L(θ) ˎ ௚ײతʹ͸ʮ࠷໬ਪఆͷਪఆྔͷ͹Β͖ͭͷԼݶ͕ൃࢄͯ͠ ͠·͏ʯ
  16. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... — Ψ΢ε෼෍ͷ৔߹ ˎ ୯ҰͷΨ΢ε෼෍ͷ৔߹͸෼ࢄ͕খ͘͞ͳΔˠଞͷσʔλ఺ʹର͢Δ໬౓͕

    ݮগˠ໬౓͸ 0 ʹऩଋ͢Δ ˎ ࠞ߹Ψ΢εϞσϧͷ৔߹ 1 ͭ໨ͷ෼෍Ͱଞͷσʔλ఺ʹରͯ͠༗ݶͷ֬཰Λ ׂΓ౰ͯˠ΋͏ 1 ͭͷ෼෍͕͋Δσʔλ఺ʹରͯ͠աֶश͢Δͱແݶେʹൃ ࢄ͍ͯ͘͠ ˎ ಛҟϞσϧʹ͸ਖ਼ଇϞσϧͷࡍʹ࢖ΘΕ͍ͯͨ AIC ͳͲͷཧ ࿦͕ద༻Ͱ͖ͳ͍͜ͱ΋͋Δ... x p(x)
  17. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ˎ ͲͷΑ͏ʹͯ͠ආ͚Δ͔ʁ ϕΠδΞϯΞϓϩʔνΛಋೖ ώϡʔϦεςΟ

    οΫʹɺऩଋͦ͠͏ʹͳͬͨΨ΢εཁૉͷฏۉ෼ࢄΛϥϯμ Ϝʹઃఆ͠ͳ͓͢ ˎ ࣝผෆՄೳੑ K ݸͷΨ΢ε෼෍ΛೖΕସ͑ͯ΋౳Ձ ྑ͍ີ౓ϞσϧΛݟ͚ͭΔࡍʹ͸ؔ܎ͳ͍ ˎ ͦΕͧΕͷΨ΢ε෼෍ͷύϥϝʔλͷղ͕ཅʹಘΒΕͳ͍ –1 ޯ഑ʹجͮ͘࠷దԽख๏ (wN = wN−1 − η∇En(wN−1)) –2 EMʂ
  18. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.2.2 ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ

    Expectation-Maximization algorithm ɿજࡏม਺Λ࣋ͭϞσϧͷ࠷໬ղΛٻΊΔ ˎ ฏۉɾ෼ࢄɾࠞ߹܎਺ʹద੾ͳॳظ஋ΛબͿ ˎ E εςοϓɿࣄޙ֬཰ʢෛ୲཰ γʣΛܭࢉ ˎ M εςοϓɿࣄޙ֬཰ʹج͖ͮฏۉɾ෼ࢄɾࠞ߹܎਺Λ࠶ܭࢉ ˎ ܁Γฦ͠ ߋ৽खଓ͖͸͍ͭ΋ର਺໬౓ؔ਺Λ૿Ճͤ͞Δ (9.4 અ) (E εςοϓ&M εςοϓͷࣜ) ∂ ∂µk ln p(X|π, µ, Σ) = ∂ ∂µk ln    K ∑ j=1 πjN(xn|µj, Σj)    (= log(f(x))′) (10) = N ∑ n=1 πkN(xn|µk, Σk) ΣjπjN(xn|µj, Σj) Σ−1 k (xn − µk) (µk − xn)T Σ−1 K (µk − xn)(11) = N ∑ n=1 γ(znk)Σ−1 k (xn − µk) = 0 (12) → µk = ∑ N n=1 γ(znk)xn ∑ N n=1 γ(znk) = 1 Nk N ∑ n=1 γ(znk)xn (13)
  19. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ͨͩ͠ Σk ͕ଘࡏ͢ΔͱԾఆ ͞Βʹ

    Σk ͱ πk ΋ γ ͰදͤΔܗͰٻΊΔ ∂ ∂Σk ln p(X|π, µ, Σ) = ∂ ∂Σk ln { K ∑ πjN(xn|µj, Σj) } = N ∑ (πkN(xn|µk, Σk))′ ∑ j πjN(xn|µj, Σj) N(xn|µk, Σk)′ = exp ( ln 1 |Σ|1/2(2π)D/2 − 1 2 (xn − µk)Σ−1(xn − µk)T ) ) ′ = N(xn|µk, Σk) × ∂ ∂Σ (ln 1 |Σ|1/2 − 1 2 (xn − µk)Σ−1(xn − µk)T ) ͋ͱ͸ԋश 2.34 ͱಉ༷ʹඍ෼͢Δͱ N ∑ πkN(xn|µk, Σk) ∑ j πjN(xn|µj, Σj) (−Σ−1 + Σ−1(xn − µk)(xn − µk)T Σ−1) = 0 Σk = 1 Nk N ∑ γ(znk)(xn − µk)(xn − µk)T
  20. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ࠷େԽ͢Δࣜͱຬͨ͢΂͖৚݅ˠϥάϥϯδϡ৐਺๏ ∂ ∂πk {

    ln p(X|, π, µ, Σ) + λ( K ∑ πk − 1) } = 0 N ∑ N(xn|µk, Σk) ΣjπjN(xn|µj, Σj) + λ = 0 N ∑ ∑ k N(xn|µk, Σk) ∑ j πjN(xn|µj, Σj) + ∑ k πkλ = 0 λ = −N 9.21 ࣜʹ λ Λ୅ೖ྆͠ล πk ഒ͢Δͱ πk = Nk N ͜ͷΑ͏ʹͯ͠ γ → µ, Σ, π → γ... ͷྲྀΕ͕Ͱ͖·ͨ͠
  21. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ऩଋ৚݅ɿ ʢ࣮༻্Ͱ͸ʣύϥϝʔλͷมԽྔ͕ threshold ΑΓ΋

    খ͘͞ͳͬͨΒ࣮ྫɿ (a) −2 0 2 −2 0 2 (b) −2 0 2 −2 0 2 (c) L = 1 −2 0 2 −2 0 2 (d) L = 2 −2 0 2 −2 0 2 (e) L = 5 −2 0 2 −2 0 2 (f) L = 20 −2 0 2 −2 0 2 EM ͷ໰୊ ˎ ൓෮ճ਺͕ʢK-means ΑΓ΋ʣͣͬͱଟ͍ ˎ ໬౓ؔ਺ͷಛҟੑΛආ͚Δख๏͕ඞཁ ˎ ہॴղ
  22. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.3 EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ જࡏม਺͕Ռͨ͢ॏཁͳ໾ׂ

    ˎ EM ͷ໨తɿજࡏม਺Λ࣋ͭϞσϧʹ͍ͭͯ࠷໬ղΛݟग़͢ ln p(X|θ) = ln { ∑ Z p(X, Z|θ) } ˎ ্ͷࣜ͸ʮࢦ਺ܕ෼෍଒ͷର਺ͷ࿨ʯͳͷͰෳࡶͳܗʹͳΔ ͜ͱ͕ଟ͍ ˎ ΋͠ Z ͕༩͑ΒΕ͍ͯͨ৔߹ʢԾఆʣର਺໬౓ؔ਺ͷ࠷େԽ ͸༰қ ln p(X|θ) = ln p(X, Z|θ) ෆ׬શσʔλ X ͷΈ͕༩͑ΒΕ͍ͯͨ৔߹ – Z ͷظ଴஋ΛಘΔ – ظ଴஋ʹରͯ͠ύϥϝʔλ θ Λ࠷େԽ͢Δ – Z ͷظ଴஋ΛಘΔ ...
  23. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... (Ұൠతͳ EM ΞϧΰϦζϜ) ˎ

    E-step જࡏม਺ͷࣄޙ෼෍ʢෛ୲཰ʣΛܭࢉ p(Z|X) ˎ M-step ೚ҙͷύϥϝʔλʹର͢Δ׬શσʔλର਺໬౓ͷظ଴஋Λܭࢉ Q(θ, θold) = ∑ Z p(Z|X, θold) ln p(X, Z|θ) ৽͍͠ਪఆ஋ θnew = arg maxθ Q(θ, θold) ˎ ύϥϝʔλ஋͕ऩଋͨ͠Βऴྃɺ͍ͯ͠ͳ͚Ε͹࠶ͼ E-step
  24. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... (p(θ) ͕ఆٛ͞ΕͨϞσϧͷ࠷େԽ͢Δྔ) p(θ) ͕ఆٛ͞Ε͍ͯΔͷͰɺX

    ͕༩͑ΒΕͨ΋ͱͰͷ θ ͷࣄޙ֬཰ͷ࠷େԽΛߟ͑Δͱ ln p(θ|X) = ln p(X|θ) + ln p(θ) + C (14) = ∑ Z p(Z|X, θold) ln p(X, Z|θ) + ln p(θ) + C (15) = Q(θ, θold) + ln p(θ) + C (16) ࣄલ෼෍ʹΑΓύϥϝʔλʹ੍ݶ͕ՃΘΔͷͰɺಛҟੑΛऔΓআ͘͜ͱ͕Ͱ͖Δ ϥϯμϜܽଛ ˎ σʔλू߹தͷܽଛ஋Λඇ؍ଌม਺ͱΈͳͯ͠ EM ˎ ؍ଌ஋͕ࣦΘΕΔݪҼ͕ඇ؍ଌσʔλͷ஋ʹΑΒͳ͍͜ͱΛલఏ
  25. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.3.1 ࠞ߹Ψ΢ε෼෍࠶๚ 9.3 ͷҰൠԽ͞Εͨ

    EM Λࠞ߹Ψ΢ε෼෍ʹద༻ͯ͠ΈΔ ·ͣ X ͱ Z ΋༩͑ΒΕ͍ͯΔͱԾఆͨ͠׬શσʔλू߹ʹ͍ͭͯ p(X, Z|µ, Σ, π) = N ∏ n=1 K ∏ k=1 πznk k N(xn|µk, Σk)znk (17) ln p(X, Z|µ, Σ, π) = ∑ n ∑ k znk{ln πk + ln N(xn|µk, Σk)} (18) ln p(X|, π, µ, Σ) = ∑ n ln{ ∑ k πkN(xn|µk, Σk)} (19) Ψ΢ε෼෍ͷ࿨ͷର਺ → ؆୯ʹܭࢉͰ͖Δ µk, Σk ʹ͍ͭͯඍ෼͢ΔͱΫϥελʔ k ʹଐ͢Δ x ͷΨ΢ε෼෍ͷର਺ͷ࿨ (ln N(xn|µk, Σk + ln N(xn+1|µk, Σk) + · · · )′ = 0
  26. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ࠞ߹܎਺ͷ࠷େԽ → ϥάϥϯδϡ৐਺ ∂

    ∂πk { ∑ n ∑ k znk ln πk + λ{ ∑ k πk − 1} } = 0 (20) λ = −N (21) πk = 1 N ∑ n znk (22) ࠞ߹܎਺͸Ϋϥελʔ k ʹ෼ྨ͞Εͨσʔλ఺ͷׂ߹Ͱද͞Εɺ ׬શσʔλͰ͋Ε͹ཅͷܗͰղ͘͜ͱ͕Ͱ͖Δ
  27. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ࣮ࡍʹଘࡏ͢Δͷ͸ෆ׬શσʔλͳͷͰɺજࡏม਺ͷࣄޙ෼෍ʹ ؔͯ͠໬౓ؔ਺ͷظ଴஋ΛٻΊΔ p(Z|X, µ,

    Σ, π) = p(X|Z)p(Z) p(X) ∝ p(X|Z)p(Z) (23) = ∏ k πzk k ∏ n ∏ k N(xn|µk, Σk)znk (24) = ∏ n ∏ k [πkN(xn|µk, Σk)]znk (25) z Λͭͳ͙ϊʔυ͸ύϥϝʔλϊʔυ͚ͩɺ͔ͭ͢΂ͯͷύϥϝʔλϊʔυ͸ tail-to-tail ʢԋश 9.5ʣ
  28. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... E[znk] = ∑ zn

    znk ∏ k′ [πk′ N(xn|µk′ , ∑ k′ )]znk′ ∑ zn ∏ k′ [πk′ N(xn|µk′ , ∑ k′ )]znk′ (26) = γ(znk) (27) zn = 0, 0, 1, 0, · · · Ez[ln p(X, Z|µ, Σ, π)] = ∑ n ∑ k γ(znk){ln πk + ln N(xn|µk, Σk)} (28) z ͜͜Ͱ͸͢΂ͯͷ zk ʹׂΓৼΒΕΔ৔߹ʹ͍ͭͯΛߟ͑ظ଴஋ Λͱ͍ͬͯΔ → ιϑτׂΓ౰ͯ
  29. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ԋश 9.8. ∂ ∂µk

    EZ [ln p(X, Z|µ, Σ, π)] = ∑ n γ(znk)Σ−1 k (xn − µk) (29) ∂ ∂Σk EZ [ln p(X, Z|µ, Σ, π)] = (30) ∑ n γ(znk)(−Σ−1 + Σ−1(xn − µk)(xn − µk)T Σ−1) (31) ∂ ∂πk { EZ [ln p(X, Z|µ, Σ, π)] + λ( ∑ k πk − 1) } = ∑ n γ(znk) πk + λ (32) 9.2.2 અͱಉ͡ܗʹͳΔͷͰ=0 ͱ͓͍ͯղ͚͹ɺಉ͡ γ ΛؚΜͩܗͷղ͕ಘΒΕΔɻ
  30. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... 9.3.2 K-means ͱͷؔ࿈ ˎ

    ࠷ޙͷ෬ઢճऩΛ͠·͢ ˎ EM ΞϧΰϦζϜ ɿ ιϑτʹׂΓ౰ͯ ˎ K-means ɿ ϋʔυʹׂΓ౰ͯ lim΍ΘΒ͔͍→͔͍ͨEM = K − means
  31. Introduction K-means ΫϥελϦϯά ࠞ߹Ψ΢ε෼෍ EM ΞϧΰϦζϜͷ΋͏ͻͱͭͷղऍ... ڞ෼ࢄߦྻ Σ = ϵIʢͨͩ͠

    ϵ ͸ఆ਺ʣ p(x|µk, Σk) = 1 (2πϵ)D/2 exp { − 1 2ϵ ∥x − µk∥2 } (33) γ(znk) = πk exp{−∥xn − µk∥2/2ϵ} Σjπj exp{−∥xn − µj∥2/2ϵ} (34) ∥xn − µj∥2 ͕࠷খʹͳΔ j∗ ʹ஫໨ɻ ϵ → 0 ͢ͳΘͪ exp{−∥xn − µk∥2/2ϵ} ͷ஋͕΋ͷ͘͢͝খ͍͞ͱ͖ɺ γ ͸ j∗ Ͱ̍ɺͦΕҎ֎Ͱ 0 ʹऩଋ ͭ·Γ π ͷ஋ʹؔ܎ͳ͘ڑ཭͕࠷΋খ͍͞Ϋϥελʔʹ 100%ͷ֬཰ͰׂΓ౰ͯ γ(znk) → rnk, π → const ͱͳ͍ͬͯΔ͜ͱΛߟ͑Δͱɺ EZ [ln p(X, Z|µ, Σ, π)] → ∑ n ∑ k rnk{ln N(xn|, µk, Σk) + ln πk} = ∑ n ∑ k rnk{ D 2 ln(2πϵ) − 1 2ϵ ∥xn − µk∥2 + ln πk} = − 1 2ϵ ∑ n ∑ k rnk∥xn − µk∥2 + const