Save 37% off PRO during our Black Friday Sale! »

ベイズ推論による機械学習入門 4章前半

ベイズ推論による機械学習入門 4章前半

某所での輪読用資料
須山敦志『ベイズ推論による機械学習入門』4.1節〜4.3節

3c1b9d9be0fff4a8904e2a26d4122c0c?s=128

Takahiro Kawashima

October 01, 2018
Tweet

Transcript

  1. ਢࢁ྘ຊ 4 ষલ൒ ઒ౡوେ October 1, 2018 ిؾ௨৴େֶ 4 ೥

  2. ໨࣍ 1. ࠞ߹Ϟσϧͱࣄޙ෼෍ͷਪ࿦ 2. ֬཰෼෍ͷۙࣅख๏ 3. ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ࿦ 2

  3. ࠞ߹Ϟσϧͱࣄޙ෼෍ͷਪ࿦

  4. ࠞ߹Ϟσϧͷಈػ ෳ਺ͷ෼෍ͷ଍͋͠ΘͤͰΑΓෳࡶͳϞσϧΛ ˠࠞ߹Ϟσϧ ୯ҰͷΨ΢εϞσϧͰ͸આ໌Ͱ͖ͳͦ͞͏ 3

  5. ࠞ߹Ϟσϧͷσʔλੜ੒աఔ Ϋϥελ਺ K ͸ط஌ ੜ੒σʔλ X = {x1, . .

    . , xN } જࡏม਺ (one-hot) S = {s1, . . . , sN } ࠞ߹ൺ཰ π = (π1, . . . , πK)⊤ ֤Ϋϥελ಺ύϥϝʔλ Θ = (θ1, . . . , θK)⊤ 4
  6. ࠞ߹Ϟσϧͷσʔλੜ੒աఔ p(X, S, Θ, π) = p(X|S, Θ)p(S|π)p(Θ)p(π) = [

    N ∏ n=1 p(xn|sn, Θ)p(sn|π) ] [ K ∏ k=1 p(θk) ] p(π) (4.5) sn ʹΧςΰϦΧϧ෼෍ɼͦͷύϥϝʔλ π ʹσΟϦΫϨ෼෍Ͱ ڞ໾ࣄલ෼෍ p(sn|π) = Cat(sn|π) (4.2) p(π) = Dir(π|α) (4.3) 5
  7. ࠞ߹Ϟσϧͷࣄޙ෼෍ ਪఆ͍ͨ͠ະ஌ม਺ͷಉ࣌ࣄޙ෼෍͸ p(S, Θ, π|X) = p(X, S, Θ, π)

    p(X) (4.6) ͞ΒʹΫϥελΛਪఆ͢Δʹ͸ p(S|X) = ∫∫ p(S, Θ, π|X)dΘdπ (4.7) ͷܭࢉ͕ඞཁ 6
  8. ࠞ߹Ϟσϧͷࣄޙ෼෍ ਖ਼نԽ߲ p(X) ΛཅʹಘΔʹ͸ p(X) = ∑ S ∫∫ p(X,

    S, Θ, π)dΘdπ = ∑ S p(X, S) (4.8) Λܭࢉ ੵ෼͸ڞ໾ࣄલ෼෍Λ࢖͑͹ղੳతʹධՁͰ͖Δ͕ʜʜ S ͷ͢΂ͯͷ૊Έ߹Θͤʹର͢Δ࿨͕ඞཁ ˠ MCMCɼม෼ਪ࿦ͳͲͰࣄޙ෼෍Λۙࣅ 7
  9. ֬཰෼෍ͷۙࣅख๏

  10. ΪϒεαϯϓϦϯά ѻ͍ͮΒ͍֬཰෼෍ p(z1, z2, z3) ͷ౷ܭྔΛಘ͍ͨ ˠ MCMC(Markov chain Monte

    Carlo) Ͱ p(z1, z2, z3) ͔Βαϯϓ Ϧϯά ΪϒεαϯϓϦϯά ҎԼͷ full conditional ෼෍͔Β܁Γฦ͠αϯϓϦϯάͯ͠ p(z1, z2, z3) ͔ΒͷαϯϓϦϯάܥྻΛಘΔ z(i) 1 ∼ p(z1|z(i−1) 2 , z(i−1) 3 ) z(i) 2 ∼ p(z2|z(i) 1 , z(i−1) 3 ) (4.10) z(i) 3 ∼ p(z3|z(i) 1 , z(i) 2 ) 8
  11. ΪϒεαϯϓϦϯά 2 ࣍ݩΨ΢ε෼෍ʹରͯ͠ΪϒεαϯϓϦϯά (ਤ 4.4) ੨ઢɿਅͷ෼෍ɼ੺ઢɿαϯϓϧू߹͔Βಘͨۙࣅ෼෍ 2 1 0 1

    2 3 4 z1 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z2 p(z) q(z) 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z1 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 z2 p(z) q(z) ม਺ؒͷ૬͕ؔେ͖͍ͱո͘͠ͳΓ͕ͪ 9
  12. ൃలख๏ 1ɿϒϩοΩϯάΪϒεαϯϓϦϯά ϒϩοΩϯάΪϒεαϯϓϦϯά z2, z3 ͷಉ࣌෼෍Λ༻͍ͯΪϒεαϯϓϦϯά z(i) 1 ∼ p(z1|z(i−1)

    2 , z(i−1) 3 ) z(i) 2 , z(i) 3 ∼ p(z2, z3|z(i) 1 ) (4.11) • z2 ͱ z3 ͷ૬͕ؔڧͯ͘΋͏·͍͖͘΍͍͢ • p(z2, z3|z(i)) ͔ΒαϯϓϦϯά͠΍͍͢ඞཁ 10
  13. ൃలख๏ 2ɿ่յܕΪϒεαϯϓϦϯά ่յܕΪϒεαϯϓϦϯά z3 ΛपลԽআڈޙɼp(z1, z2) ͔ΒΪϒεαϯϓϦϯά p(z1, z2) =

    ∫ p(z1, z2, z3)dz3 (4.12) z(i) 1 ∼ p(z1|z(i−1) 2 ) z(i) 2 ∼ p(z2|z(i) 1 ) (4.13) • ߴ଎Խ͕ݟࠐΊΔ • पล෼෍͕ղੳతʹٻ·Δඞཁ • ࢒Γͷม਺͕αϯϓϦϯά͠΍͍͢ܗࣜͰ͋Δඞཁ 11
  14. ม෼ਪ࿦ ֬཰෼෍ p(z1, z2, z3) Λѻ͍΍͍ۙ͢ࣅ෼෍ q(z1, z2, z3) Ͱදݱ

    ˠ KL ڑ཭࠷খԽ qopt.(z1, z2, z3) = arg min q KL[q(z1, z2, z3)∥p(z1, z2, z3)] (4.14) ม෼ਪ࿦ q ͷදݱೳྗΛݶఆͯ͠ KL ڑ཭Λ࠷খԽ 12
  15. ม෼ਪ࿦ ฏۉ৔ۙࣅ ֤֬཰ม਺ʹಠཱੑΛԾఆ p(z1, z2, z3) ≈ q(z1)q(z2)q(z3) (4.15) q(z1),

    q(z2), q(z3) Λ KL ڑ཭͕খ͘͞ͳΔΑ͏ஞ࣍తʹमਖ਼ Notation ⟨·⟩q(z1)q(z2)q(z3) = ⟨·⟩1,2,3 13
  16. ม෼ਪ࿦ q(z2), q(z3) Λॴ༩ͱͯ͠ q(z1) Λ࠷దԽ qopt.(z1) = arg min

    q(z1) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] (4.16) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] = − ⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 1,2,3 (4.18) = − ⟨⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 2,3 ⟩ 1 (4.19) = − ⟨ ⟨ln p(z1, z2, z3)⟩2,3 − ⟨ln q(z1)⟩2,3 − ⟨ln q(z2)⟩2,3 − ⟨ln q(z3)⟩2,3 ⟩ 1 (4.20) 14
  17. ม෼ਪ࿦ ⟨ln q(z1)⟩2,3 = ln q(z1)ɼq(z1) ͱແؔ܎ͳ෦෼Λఆ਺ʹ੔ཧ = − ⟨⟨ln

    p(z1, z2, z3)⟩2,3 − ln q(z1)⟩ 1 + const. (4.21) = − ⟨ln [exp(⟨ln p(z1, z2, z3)⟩2,3)] − ln q(z1)⟩ 1 + const. = − ⟨ ln exp(⟨ln p(z1, z2, z3)⟩2,3) ln q(z1) ⟩ 1 + const. (4.22) = KL[q(z1)∥exp{⟨ln p(z1, z2, z3)⟩2,3}] + const. (4.23) ࠷ऴతʹࣜ (4.23) ͷ࠷খ஋͸ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) ͰಘΒΕΔ (q(z2), q(z3) ʹ͍ͭͯ΋ಉ༷) 15
  18. ม෼ਪ࿦ ฏۉ৔ۙࣅʹΑΔม෼ਪ࿦ (ΞϧΰϦζϜ 4.1) q(z2), q(z3) ΛॳظԽ for i =

    1, . . . , max iter do ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. ln q(z2) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z3) + const. ln q(z3) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z2) + const. end for ΋͏ͪΐ ͬͱ͔͍͜͠ऴྃ৚݅Λઃఆ͍ͨ͠ ˠͨͱ͑͹ ELBO(evidence lower bound) ΛධՁج४ʹ 16
  19. ม෼ਪ࿦ ELBO(A.4, p.233) ม෼ਪ࿦͸ʮपล໬౓ͷԼݶʯͷ࠷େԽख๏ͱͯ͠΋ଊ͑ΒΕΔ Xɿ؍ଌσʔλɼZɿະ؍ଌม਺ Z ∼ q(Z) ΛԾఆ ln

    p(X) = ln ∫ p(X, Z)dZ = ln ∫ q(Z) p(X, Z) q(Z) dZ ≥ ∫ q(Z)ln p(X, Z) q(Z) dZ (Jensen ͷෆ౳ࣜ) =: L[q(Z)] (A.39) 17
  20. ม෼ਪ࿦ ࢀߟɿJensen ͷෆ౳ࣜ ೚ҙͷ “্ʹ” ತͳؔ਺ fɼ೚ҙͷ֬཰ີ౓ؔ਺ p ʹؔͯ͠ f

    (∫ y(x)p(x)dx ) ≥ ∫ f(y(x))p(x)dx (A.40) 18
  21. ม෼ਪ࿦ ELBO(A.4, p.233) पล໬౓ͷԼݶ L[q(Z)] Λ q(Z) ͷ ELBO ͱΑͿ

    ର਺पล໬౓ͱ ELBO ͱͷࠩ͸ q(Z) ͱ p(Z|X) ͱͷ KL ڑ཭ʹ ౳͍͠ KL[q(Z)∥p(Z|X)] = ∫ q(Z)ln q(Z) p(Z|X) dZ = ∫ q(Z)ln q(Z)p(X) p(X, Z) dZ = p(X) − ∫ q(Z)ln p(X, Z) q(Z) dZ = p(X) − L[q(Z)] (A.41) 19
  22. ม෼ਪ࿦ ELBO(A.4, p.233) KL[q(Z)∥p(Z|X)] = p(X) − L[q(Z)] (A.41) ln

    p(X) ͸σʔλͱϞσϧॴ༩ͷ΋ͱఆ਺ ˠ q(Z) ʹؔ͢Δ KL ڑ཭࠷খԽͱର਺पล໬౓ͷԼݶ L[q(Z)] ͷ ࠷େԽ͸౳Ձ ELBO ͷมԽ஋͕ఆ਺ ϵ ΑΓখ͘͞ͳͬͨͱ͖ʹม෼ਪ࿦Ξϧΰ ϦζϜΛࢭΊΔ 20
  23. ม෼ਪ࿦ ߏ଄Խม෼ਪ࿦ ਅͷ෼෍Λ෦෼తʹۙࣅؔ਺ʹ෼ղ p(z1, z2, z3) ≈ q(z1)q(z2, z3) (4.26)

    21
  24. ม෼ਪ࿦ (؆қ࣮ݧ) 2 ࣍ݩΨ΢ε෼෍ʹม෼ਪ࿦Λద༻ (ਤ 4.5) 1.0 0.5 0.0 0.5

    0.50 0.25 0.00 0.25 0.50 1 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 2 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 3 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 4 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 5 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 6 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 7 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 8 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 9 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 10 of 10 ੨ઢɿਅͷ෼෍ ੺ઢɿۙࣅࣄޙ෼෍ 22
  25. ม෼ਪ࿦ (؆қ࣮ݧ) 2 ࣍ݩΨ΢ε෼෍ʹม෼ਪ࿦Λద༻ (ਤ 4.5) 2 4 6 8

    10 iteration 0.46 0.48 0.50 0.52 0.54 KL divergence KL ڑ཭͸୯ௐݮগ 23
  26. ม෼ਪ࿦ (؆қ࣮ݧ) 2 ࣍ݩΨ΢ε෼෍ʹม෼ਪ࿦Λద༻ (ਤ 4.5) • ͸΍͍ • ΠςϨʔγϣϯ͝ͱʹ

    KL ڑ཭͕୯ௐݮগ • ڧ͍૬ؔΛଊ͑ΒΕͳ͍ 24
  27. ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ࿦

  28. ϙΞιϯࠞ߹Ϟσϧ 1 ࣍ݩ཭ࢄඇෛσʔλͷΫϥελΛਪఆ (ਤ 4.6) 80 100 120 140 160

    180 0 20 40 60 80 100 120 observation 25
  29. ϙΞιϯࠞ߹Ϟσϧ p(xn|λk) = Poi(xn|λk) (4.27) ΑΓ p(xn|sn, λ) = K

    ∏ k=1 Poi(xn|λk)sn,k (4.28) λk ͷڞ໾ࣄલ෼෍͸ p(λk) = Gamma(λk|a, b) (4.29) 26
  30. ΪϒεαϯϓϦϯά ࠞ߹෼෍Ͱ͸જࡏม਺ͱύϥϝʔλΛ෼͚ͯαϯϓϧ͢ΔͱΑ͍ S ∼ p(S|X, λ, π) (4.31) λ, π

    ∼ p(λ, π|X, S) (4.32) ม਺ S ͷΈʹண໨ p(S|X, λ, π) ∝ p(X|S, λ)p(S|π) = N ∏ n=1 p(xn|sn, λ)p(sn|π) (4.33) 27
  31. ΪϒεαϯϓϦϯά p(xn|sn, λ), p(sn|π) ΛͦΕͧΕܭࢉ͢Δͱɼ࠷ऴతʹ sn ∼ Cat(sn|ηn ) (4.37)

    ͨͩ͠ ηn,k ∼ exp{xnln λk − λk + ln πk} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.38) ͕ಘΒΕΔ 28
  32. ΪϒεαϯϓϦϯά p(λ, π|X, S) ∝ p(X, S, λ, π) =

    p(X|S, λ)p(S|π)p(λ)p(π) (4.39) ˠ λ ͱ π ͷࣄޙ෼෍͸ಠཱ λ ʹؔ܎ͷ͋Δͱ͜Ζʹ͚ͩ஫໨ p(λ|X, S) ∝ p(X|S, λ)p(λ) 29
  33. ΪϒεαϯϓϦϯά ۩ମతʹܭࢉ͍ͯ͘͠ͱ λk ∼ Gam(λk|ˆ ak,ˆ bk) (4.41) ͨͩ͠ ˆ

    ak = N ∑ n=1 sn,kxn + a ˆ bk = N ∑ n=1 sn,k + b (4.42) ͱͳΔ 30
  34. ΪϒεαϯϓϦϯά π ʹؔ܎ͷ͋Δͱ͜Ζʹ͚ͩ஫໨ p(π|X, S) ∝ p(S|π)p(π) ࠷ऴతʹ͸ π ∼

    Dir(π|ˆ α) (4.44) ͨͩ͠ ˆ αk = N ∑ n=1 sn,k + αk (4.45) 31
  35. ม෼ਪ࿦ જࡏม਺ͱύϥϝʔλʹ෼ղ (ม෼ϕΠζ EM ΞϧΰϦζϜ) p(S, λ, π|X) ≈ q(S)q(λ,

    π) (4.46) ม෼ਪ࿦ͷެࣜ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) Λ༻͍Δͱ q(S) ʹؔͯ͠ ln q(S) = ⟨ln p(X, S, λ, π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)p(S|π)p(λ)p(π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)⟩q(λ) + ⟨ln p(S|π)⟩q(π) + const. = [ N ∑ n=1 ⟨ln p(xn|sn, λ)⟩q(λ) + ⟨ln p(sn|π)⟩q(π) ] + const. (4.47) 32
  36. ม෼ਪ࿦ (4.47) ࣜ૯࿨಺ͷୈ 1 ߲͸ ⟨ln p(xn|sn, λ)⟩q(λ) = K

    ∑ k=1 ⟨sn,k ln Poi(xn|λk)⟩qk = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩) + const. (4.48) ୈ 2 ߲͸ ⟨ln p(sn|π)⟩q(π) = ⟨ln Cat(sn|π)⟩q(π) = K ∑ k=1 sn,k⟨ln πk⟩ (4.49) 33
  37. ม෼ਪ࿦ ࣜ (4.47),(4.48),(4.49) ͔Β ln q(sn) = ⟨ln p(xn|sn, λ)⟩q(λ)

    + ⟨ln p(sn|π)⟩q(π) + const. = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩ + const.) ͜͜Ͱ ln Cat(s|π) = ∑ K k=1 sn,k ln πk ΑΓ q(sn) = Cat(sn|ηn ) (4.50) ͨͩ͠ ηn,k ∝ exp{xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.51) λ, π ͷظ଴஋ܭࢉ͸Ұ୴͋ͱ·Θ͠ 34
  38. ม෼ਪ࿦ ଓ͍ͯύϥϝʔλͷۙࣅ෼෍͸ ln q(λ, π) = ⟨ln p(X, S, λ,

    π)⟩q(S) + const. = ⟨ln p(X|S, λ)⟩q(S) + ln p(λ) + ⟨ln p(S|π)⟩q(S) + ln p(π) + const. ΑΓɼλ, π ͕ಠཱʹ෼ղ͞Ε͍ͯΔ͜ͱ͕Θ͔Δ ˠ q(λ, π) ͷ୅ΘΓʹ q(λ), q(π) ΛͦΕͧΕٻΊΕ͹Α͍ 35
  39. ม෼ਪ࿦ q(sn) ͷͱ͖ͱಉ༷ʹܭࢉ͍ͯ͘͠ͱɼ݁Ռͱͯ͠ q(λk) = Gam(λk|ˆ ak,ˆ bk) (4.54) ͨͩ͠

    ˆ ak = N ∑ n=1 ⟨sn,k⟩xn + a ˆ bk = N ∑ n=1 ⟨sn,k⟩ + b (4.55) ͓Αͼ q(π) = Dir(π|ˆ α) (4.56) ͨͩ͠ ˆ αk = N ∑ n=1 ⟨sn,k⟩ + αk (4.57) ͕ಘΒΕΔ 36
  40. ม෼ਪ࿦ ࣜ (4.57) ಺ͷظ଴஋ ⟨sn,k⟩ = ⟨sn,k⟩q(S) ͸ɼ q(sn) =

    Cat(sn|ηn ) (4.50) ΑΓɼ ⟨sn,k⟩q(S) = ηn,k 37
  41. ม෼ਪ࿦ q(λk) = Gam(λk|ˆ ak,ˆ bk), q(π) = Dir(π|ˆ α)

    ͕Θ͔ͬͨͷͰɼ ͋ͱ·Θ͠ʹ͍ͯͨ͠ q(sn) ಺ͷظ଴஋ ⟨λ⟩, ⟨ln λ⟩, ⟨ln π⟩ Λܭࢉ ͜͜Ͱ Eλ∼Gam(λ|a,b) [λ] = a b (2.59) Eλ∼Gam(λ|a,b) [ln λ] = ψ(a) − ln b (2.60) Eπ∼Dir(π|α) [ln πk] = ψ(αk) − ψ ( K ∑ l=1 αk ) (2.52) ψ(x) ͸σΟΨϯϚؔ਺ ψ(x) = d dx ln Γ(x) (A.26) 38
  42. ม෼ਪ࿦ ࣜ (2.59), (2.60), (2.52) Λ༻͍ΔͱɼٻΊ͍ͨظ଴஋͸ ⟨λk⟩ = ˆ ak

    ˆ bk (4.60) ⟨ln λk⟩ = ψ(ˆ ak) − ln ˆ bk (4.61) ⟨πk⟩ = ψ(ˆ αk) − ψ ( K ∑ l=1 ˆ αk ) (4.62) ͱಘΒΕΔ 39
  43. ่յܕΪϒεαϯϓϦϯά ࠞ߹Ϟσϧͷ่յܕΪϒεαϯϓϦϯάͰ͸ಉ࣌෼෍͔Βύϥ ϝʔλΛपลԽআڈ p(X, S) = ∫∫ p(X, S, λ,

    π)dλdπ (4.63) ͋ͱ͸ p(S|X) ͔ΒαϯϓϦϯάͰ͖Ε͹Α͍͕ʜʜ 40
  44. ่յܕΪϒεαϯϓϦϯά पลԽલޙͷάϥϑΟΧϧϞσϧ (ਤ 4.7) sn ͕΄͔ͷશͯͷ S ͷཁૉͱґଘؔ܎ (׬શάϥϑ) 41

  45. ่յܕΪϒεαϯϓϦϯά p(S|X) = p(X|S)p(S) ∑ S p(X|S)p(S) ΑΓɼp(S|X) ͔Β௚઀αϯϓϦϯά͢Δʹ͸ɼ෼഑ؔ਺ͷධՁ ʹ

    KN ճͷܭࢉ͕ඞཁ ˠ S ͷ֤ཁૉʹΪϒεαϯϓϦϯάΛద༻ p(sn|X, S\n ) ∝ p(xn, X\n , sn, S\n ) (4.64) = p(xn|X\n , sn, S\n )p(X\n |sn, S\n ) × p(sn|S\n )p(S\n ) (4.65) ∝ p(xn|X\n , sn, S\n )p(sn|S\n ) (4.66) 42
  46. ่յܕΪϒεαϯϓϦϯά (4.66) ࣜӈଆ͸ p(sn|S\n ) = ∫ p(sn|π)p(π|S\n )dπ (4.70)

    = Cat(sn|η\n ) (4.74) η\n,k ∝ ∑ n′̸=n sn′,k + αk (4.75) α ͸ࣄલ෼෍ p(π) = Dir(π|α) ͷύϥϝʔλ 43
  47. ่յܕΪϒεαϯϓϦϯά (4.66) ࣜࠨଆ͸ p(xn|X\n , sn, S\n ) = ∫

    p(xn|sn, λ)p(λ|X\n , S\n )dλ (4.76) ͜Ε͸ sn,k = 1 Ͱ৚݅෇͚Δͱղੳతʹ࣮ߦͰ͖ͯ p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ˆ a\n,k = ∑ n′̸=n sn′,kxn′ + ak (4.80) ˆ b\n,k = ∑ n′̸=n sn′,k + bk (4.81) ak, bk ͸ࣄલ෼෍ p(λk) = Gam(λk|ak, bk) ͷύϥϝʔλ 44
  48. ่յܕΪϒεαϯϓϦϯά ۩ମతͳ p(sn|S\n ) ͔ΒͷαϯϓϦϯάखॱ 1. sn ͷ࣮ݱ஋ͱͯ͠ (1, 0,

    . . . , 0)⊤ ͔Β (0, 0, . . . , 1)⊤ Λ༻ҙ 2. ͦΕͧΕʹରͯ͠ p(sn|S\n ) = Cat(sn|η\n ) (4.74) p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ΛධՁ 3. ͜ͷ K ݸͷ஋Λਖ਼نԽ͢Δͱɼp(sn|X) Λࣔ͢ΧςΰϦΧ ϧ෼෍͕ಘΒΕΔ 4. ಘΒΕͨ p(sn|X) ͔ΒαϯϓϦϯά 45
  49. ؆қ࣮ݧ 1 ࣍ݩ཭ࢄඇෛσʔλͷΫϥελਪఆ݁Ռ (ม෼ਪ࿦) 80 100 120 140 160 180

    0 20 40 60 80 100 120 observation 80 100 120 140 160 180 0 20 40 60 80 100 120 estimation ੺ͱ੨ͷ 2 Ϋϥελʹ෼཭ Ϋϥελॴଐ֬཰Λதؒ৭Ͱදݱ 46
  50. ؆қ࣮ݧ ELBO ͷऩଋ࣌ؒ (ਤ 4.10) ॎ࣠ɿELBOɼԣ࣠ (ର਺)ɿܭࢉ࣌ؒ [µs] 10 5

    10 4 10 3 computation time( s) 5400 5200 5000 4800 4600 4400 ELBO VI GS CGS ؆୯ͳ໰୊ͳͷͰ࠷ऴతͳਫ਼౓ʹ͕ࠩͳ͍ 47
  51. ؆қ࣮ݧ େ·͔ͳ܏޲ͱͯ͠ • ଎͍ͷ͸ม෼ਪ࿦ • ࠷ऴతʹਫ਼౓͕ྑ͍ͷ͸่յܕ GS • ่յܕ GS

    ͸ΠςϨʔγϣϯॳظ͔Βߴਫ਼౓ ΦεεϝɿͱΓ͋͑ͣ GS Λࢼ͠ɼ଎౓΍ਫ਼౓ʹೲಘ͕͍͔ͳ͚ Ε͹ม෼ਪ࿦ɾ่յܕ GS ΋ಋग़ͯ͠ΈΔ 48
  52. ·ͱΊ • ࣄޙ෼෍ͷۙࣅख๏ͱͯ͠ΪϒεαϯϓϦϯάɾϒϩοΩϯ άΪϒεαϯϓϦϯάɾ่յܕΪϒεαϯϓϦϯάɾม෼ਪ ࿦Λ঺հ • ϙΞιϯࠞ߹Ϟσϧʹରͯ͠ΪϒεαϯϓϦϯάɾ่յܕΪ ϒεαϯϓϦϯάɾม෼ਪ࿦Λ۩ମతʹಋग़ • ܭࢉ͕࣌ؒ଎͍ͷ͸ม෼ਪ࿦ɼਫ਼౓͕ྑ͍ͷ͸่յܕΪϒε

    αϯϓϦϯάɼಋग़ָ͕ͳͷ͸ΪϒεαϯϓϦϯά 49