Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BDA3 17章

290

BDA3 17章

某輪読会で発表した際の資料.Gelmanら『Bayesian Data Analysis 3rd. Edition』17章

Takahiro Kawashima

May 29, 2018
Tweet

Transcript

  1. ໨࣍ 1. Aspects of robustness 2. Overdispersed versions of standard

    models 3. Posterior inference and computation 4. Robust inference for the eight schools 5. Robust regression using t-distributed errors 2
  2. Aspects of robustness ෮श: Eight Schools • SAT-V ͱ͍͏ςετʹର͢ΔಛผิशͷޮՌΛղੳ •

    8 ߍ͕ࢀՃ • 5 ষͰ͸ʮֶߍ͝ͱͷฏۉิशޮՌ͸ಉఔ౓ͩΖ͏ʯͱ͍͏ ղੳ݁Ռʹ 4
  3. Aspects of robustness 5 ষͷ Eight Schools ͷ֊૚Ϟσϧ • θj

    ∼ N(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) • ؍ଌͷ෼ࢄ σ2 ͸ֶ֤ߍͰڞ௨ 5
  4. Aspects of robustness 5 ষͷ Eight Schools ͷ֊૚Ϟσϧ ֶ֤ߍͷิशޮՌͷ୯७ͳฏۉͱ෼ࢄ School

    ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 12 18 6
  5. Aspects of robustness ¯ y·8 = 100 ͱஔ͖׵͑ͯΈΔ ֶ֤ߍͷิशޮՌͷ୯७ͳฏۉͱ෼ࢄ School

    ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 100 18 7
  6. Aspects of robustness ¯ y·8 = 100 ͱஔ͖׵͑ͯΈΔ • τ

    ͕େ͖͍஋ʹͳΔ (θj ͷ෼ࢄ͕େ͖͘ͳΔ) • θj ͷਪఆ஋ ˆ θj ͕΄΅؍ଌͷӨڹͷΈʹͳΔ (ࣜ 5.17) ˆ θj = 1 σ2 j ¯ y·j + 1 τ2 µ 1 σ2 j + 1 τ2 ֎Ε஋ ¯ y·8 = 100 ͷਪఆྔ΁ͷӨڹ͸খ͍ͨ͘͞͠ ˠͦ͢ͷ௕͍෼෍Λ༻͍Δ 8
  7. Aspects of robustness ͦ͢ͷ௕͍෼෍ͷݕ౼ 1. Student ͷ t ෼෍ (ࣗ༝౓

    ν = 1 Ͱ Cauchy ෼෍ɼν → ∞ Ͱਖ਼ ن෼෍) 2. ޿͕Γͷେ͖͍෼෍ͱͷࠞ߹෼෍Ϟσϧ t ෼෍ͷࣗ༝౓ ν Λେ͖͍΋ͷ͔Βখ͍͞΋ͷʹม͑ͳ͕Β৭ʑ ࢼͯ͠ΈΔͷ͕Α͍ ˠ importance resampling(17.4 અ) 9
  8. Aspects of robustness ࢀߟ: Student ͷ t ෼෍ x -3

    -2 -1 0 1 2 3 ν=1(Cauchy) ν=4 ν=10 ν→∞(Gauss) Distributions 0.0 0.1 0.2 0.3 0.4 y 10
  9. Overdispersed versions of standard models ਖ਼ن෼෍ϞσϦϯάͷݶք தԝ஋͔Β࢛෼Ґൣғͷ 1.5 ഒͷڑ཭ΑΓԕ ͍ͱ͜Ζʹ

    10%Ҏ্ͷσʔλ͕͋ΔͳΒɼ ਖ਼ن෼෍ͰͷϞσϦϯά͸ෆద੾ (Πϝʔδ: ӈͷശͻ͛ਤͰ໢ֻ͚෦ʹ 10%Ҏ্) ελϯμʔυͳϞσϧͷؤ݈ͳ୅ସΛߟ͑Δ • աখ෼ࢄʹରԠͰ͖ͳ͍͜ͱʹ஫ҙ 11
  10. Overdispersed versions of standard models ਖ਼ن෼෍ͷ୅ସͱͯ͠ͷ t ෼෍ • ͜Μͳͱ͖ʹ࢖͏Α

    1. ͨ·ʹҟৗͳσʔλ͕Ͱͯ͘Δ 2. ͨ·ʹۃ୺ͳ஋Λڐ༰͍ͨ͠ύϥϝʔλͷࣄલ෼෍ • ࣗ༝౓ ν = 1 Ͱ Cauchy ෼෍ɼν → ∞ Ͱਖ਼ن෼෍ • σʔλ͕ͨ͘͞Μ͋ΔͳΒ ν ΋ະ஌ύϥϝʔλʹ • ν ͕খ͍͞஋ΛͱΒͳ͍Α͏ʹ௒ࣄલ෼෍Λઃఆ͢Δͷ΋Φ εεϝ ˠ ν = 1, 2 ͷͱ͖͸෼ࢄ͕ଘࡏͤͣɼѻ͍ͮΒ͍ͨΊ 12
  11. Overdispersed versions of standard models ਖ਼ن෼෍ͷ୅ସͱͯ͠ͷ t ෼෍ tν(yi|µ, σ2)

    = ∫ ∞ 0 N(yi|µ, Vi)Inv-χ2(Vi|ν, σ2)dVi t ෼෍͸ 1. ݻఆͷฏۉ µ ͱ 2. ई౓෇͖ٯΧΠೋ৐෼෍ʹ͕ͨ͠͏෼ࢄ Vi Ͱද͞ΕΔਖ਼ن෼෍ͷࠞ߹෼෍ͱղऍͰ͖Δ 13
  12. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ϙΞιϯ෼෍ Poisson(yi|λ) = λyi

    yi! exp(−λ) • mean(y) = λ, var(y) = λ • ฏۉ஋ͱ෼ࢄΛ෼཭͍ͨ͠ ˠෛͷೋ߲෼෍ 14
  13. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ෛͷೋ߲෼෍ (p.578) NegBin(yi|α, β)

    = ( yi + α − 1 α − 1 )( β β + 1 )α ( 1 β + 1 )yi κ := 1 α , λ := α β ͱ͓͘ͱ NegBin(yi|κ, λ) = λyi yi! Γ(1/κ + yi) Γ(1/κ)(1/κ + λ)yi (1 + κλ)−1/κ ͱมܗͰ͖Δ • mean(y) = λ, var(y) = λ(1 + κλ) ˠ͏Ε͍͠ 15
  14. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ͪͳΈʹ NegBin(yi|r, p) =

    Γ(r + yi) yi!Γ(r) (1 − p)rpyi ͱ͍͏දهΛ͢Δͱɼ NegBin(yi|a, 1 b + 1 ) = ∫ Poisson(yi|λ)Gamma(λ|a, b)dλ ͱͳΔ (ਢࢁ྘ຊ p.86, 87) 16
  15. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ ೋ߲෼෍ Binomial(yi|n,

    p) = ( n yi ) pyi (1 − p)n−yi • mean(y) = np, var(y) = np(1 − p) • 1 − p ͷൣғ͕ [0, 1] ͷͨΊɼ෼ࢄ͕ฏۉΑΓେ͖͘ͳΒͳ͍ ͜ͱ͕Θ͔Δ ˠ Beta-Binomial ෼෍ 17
  16. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ • πj

    ∼ Beta(πj|α, β) • yj ∼ Bin(yj|m, πj) • ؍ଌͷճ਺ m ͸֤ j Ͱڞ௨ 18
  17. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ Beta-Binomial ෼෍

    p(yj|n, p) = Bin(yj|m, πj)Beta(πj|α, β) • mean(y) = n α α + β • var(y) = n αβ(α + β + m) (α + β)2(α + β + 1) • var(y) mean(y) = β(α + β + m) (α + β)(α + β + 1) ˠฏۉͱ෼ࢄΛ෼཭Ͱ͖ͨ 19
  18. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ • n

    = 100, m = 30 • mean(y) = n α α + β • var(y) = n αβ(α + β + m) (α + β)2(α + β + 1) Ћ Ќ mean variance 0.1 0.1 50.0 629.2 1.0 1.0 50.0 266.7 10.0 10.0 50.0 59.5 100.0 100.0 50.0 28.6 1000.0 1000.0 50.0 25.4 20
  19. Posterior inference and computation ه๏ͳͲ p0(θ|y): طʹσʔλ͔Βਪଌͨ͠ඇϩόετͳϞσϧͷࣄޙ෼෍ ϕ: ϩόετੑʹؔΘΔϋΠύʔύϥϝʔλ (t

    ෼෍ͷࣗ༝౓ͳͲ) ҎԼ͔ΒͷαϯϓϦϯά͕໨త (ࣜ 17.2) p(θ|ϕ, y) = p(y|θ, ϕ)p(θ|ϕ) p(y|ϕ) ∝ p(y|θ, ϕ)p(θ|ϕ) ϕ ͕ൣғͱͯ͠༩͑ΒΕΔ৔߹͸ p(ϕ|y) ΋ܭࢉ p(θ, ϕ|y) = p(θ|ϕ, y)p(ϕ|y) ∝ p(y|θ, ϕ)p(θ|ϕ)p(θ|ϕ) 22
  20. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ y = (y1, . .

    . , yn) Λ yi ∼ tν(yi|µ, σ2) ͰϞσϦϯά tν(yi|µ, σ2) = ∫ ∞ 0 N(yi|µ, Vi)Inv-χ2(Vi|ν, σ2)dVi ࣄલ෼෍͸؆୯ͷͨΊ p(µ), p(logσ) Ұఆ p(µ, σ2, V |ν, y) ͷ݁߹ࣄޙ෼෍͔Β ν, σ2 ʹ͍ͭͯͷࣄޙ෼෍Λ ߟ͑Δ 24
  21. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ ҎԼͷαϯϓϦϯάΛ܁Γฦ͢ (12.1 અ) 1. Vi|µ,

    σ2, ν, y ∼ Inv-χ2 ( Vi ν + 1, νσ2 + (yi − µ)2 ν + 1 ) 2. µ|σ2, V , ν, y ∼ N      µ ∑ n i=1 1 Vi yi ∑ n i=1 1 Vi , 1 ∑ n i=1 1 Vi      3. p(σ2|µ, V , ν, y) ∝ Gamma ( σ2 nν 2 , ν 2 ∑ n i=1 1 Vi ) 25
  22. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ • ν ͕ະ஌ͳΒ ν Λಈ͔͢εςοϓΛՃ֦͑ͯு

    ˠΪϒεαϯϓϦϯάͰ͸೉͍͠ͷͰϝτϩϙϦε๏ͳͲ • ϚϧνϞʔμϧʹͳΔͷͰ simulated tempering(ϨϓϦΧަ ׵ MCMC) Λ࢖͓͏ 26
  23. Posterior inference and computation ࣄޙ༧ଌ෼෍͔ΒͷαϯϓϦϯά 1. ࣄޙ෼෍ θ ∼ p(θ|ϕ,

    y) ͔ΒαϯϓϦϯά 2. ༧ଌ෼෍ ˜ y ∼ p(˜ y|θ, ϕ) ͔ΒαϯϓϦϯά 27
  24. Posterior inference and computation importance weighting ʹΑΔϋΠύʔύϥϝʔλͷपลࣄޙ෼෍ ͷܭࢉ • ϩόετͳΒ֎Ε஋͕͋ͬͯ΋पลࣄޙ෼෍

    p(ϕ|y) ΁ͷӨ ڹ͸খ͍͞͸ͣ (ʁ) • पลࣄޙ෼෍ͷܭࢉʹΑͬͯײ౓ͷղੳ͕Մೳ p0(θ|y) ͔ΒͷαϯϓϦϯάΛطʹ࣮ߦࡁΈ • θs, s = 1, . . . , S ϩόετϞσϧ p(ϕ|y) ͷपลࣄޙ෼෍Λۙࣅ 28
  25. Posterior inference and computation importance weighting (13.11) ࣜΑΓ p(ϕ|y) ∝

    p(ϕ)p(y|ϕ) = p(ϕ) ∫ p(y, θ|ϕ)dθ = p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ)dθ = p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ) p0(θ)p0(y|θ) p0(θ)p0(y|θ)dθ ∝ p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ) p0(θ)p0(y|θ) p0(θ|y)dθ ≈ p(ϕ) [ 1 S S ∑ s=1 p(y|θs, ϕ)p(θs|ϕ) p0(θs)p0(y|θs) ] (17.3) 29
  26. Posterior inference and computation importance weighting (17.3) ࣜ p(ϕ) [

    1 S S ∑ s=1 p(y|θs, ϕ)p(θs|ϕ) p0(θs)p0(y|θs) ] ͸ٻ·Δ ˠϩόετϞσϧͷύϥϝʔλ ϕ ͷࣄޙ෼෍͕ɼ΋ͱͷඇϩόε τͳϞσϧ͔ΒͷαϯϓϦϯά݁ՌΛ༻͍ͯۙࣅతʹಘΒΕΔ 30
  27. Posterior inference and computation importance resampling ͰϩόετϞσϧͷۙࣅࣄޙ෼෍ΛಘΔ p0(θ|y) ͔ΒͷαϯϓϦϯά݁ՌΛطʹܭࢉࡁΈ •

    θs, s = 1, . . . , S • ͨ͘͞ΜαϯϓϦϯά͓ͯ͘͠ɽS = 5000 ͱ͔ θ ͔Β importance ratio p(θs|ϕ, y) p0(θs|y) = p(θs|ϕ)p(y|θs, ϕ) p0(θs)p0(y|θs) ʹൺྫ͢Δ֬཰Ͱখ͍͞αϒαϯϓϧ (k = 500 ͱ͔) Λੜ੒͢Δ 31
  28. Posterior inference and computation importance resampling(10.4 અ) importance ratio ͕ඇৗʹେ͖͍αϯϓϧ͕গ਺ଘࡏ͢Δঢ়گ

    ˠ෮ݩநग़Ͱ͸ۙࣅࣄޙ෼෍ͷදݱྗ͕ஶ͘͠௿Լ͢Δ͓ͦΕ ˠ෮ݩநग़ͱඇ෮ݩநग़ͷதؒతͳखଓ͖ΛͱΔ 32
  29. Posterior inference and computation importance resampling(10.4 અ) 1. importance ratio

    ʹൺྫ͢Δ֬཰Ͱ θs ͔Βͻͱͭநग़͠ɼα ϒαϯϓϧʹՃ͑Δ 2. ಉ༷ʹαϒαϯϓϧʹՃ͑Δɽ͜͜Ͱ͸طʹநग़ͨ͠αϯϓ ϧ͸બ୒͠ͳ͍ 3. 1., 2. Λαϒαϯϓϧͷݸ਺͕ k ݸʹͳΔ·Ͱ܁Γฦ͢ ΋ͱͷαϯϓϧ਺͕ແݶେͰαϒαϯϓϧ͕े෼খ͚͞Ε͹ཧ࿦ తͳਖ਼౰ੑ͸อূ͞Εͦ͏͕ͩʜʜ 33
  30. Robust inference for the eight schools Eight Schools 5 ষͷ

    Eight Schools ͷ֊૚Ϟσϧ • θj ∼ N(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) • ؍ଌͷ෼ࢄ σ2 ͸ֶ֤ߍͰڞ௨ 34
  31. Robust inference for the eight schools Eight Schools θj ͷ௒ࣄલ෼෍ʹ

    Student ͷ t ෼෍Λ༻͍Δ • θj ∼ tν(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) ࣄޙ෼෍͸ p(θj, µ, τ|ν, yj) ∝ p(yj|θj, µ, τ, ν)p(θj, µ, τ|ν). 5 ষͷϞσϧ͸ p(θj, µ, τ|ν, yj) = p(θj, µ, τ|ν → ∞, yj). 35
  32. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ • ද

    5.2 • ୯७ͳֶ֤ߍ͝ͱͷฏۉͱ෼ࢄ • ¯ y·1 ͕֎Ε஋ School ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 12 18 36
  33. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ද 5.3(ਖ਼ن෼෍ϞσϦϯάͰͷิशޮՌͷਪఆ݁Ռ)

    School posterior quantiles 2.5% 25% 50% 75% 97.5% A -2 7 10 16 31 B -5 3 8 12 23 C -11 2 7 11 19 D -7 4 8 11 21 E -9 1 5 10 18 F -7 2 6 10 28 G -1 7 10 15 26 H -6 3 8 13 33 37
  34. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ࣗ༝౓Λ ν

    = 4 ʹݻఆͯ͠ΪϒεαϯϓϦϯά ද 17.1(t ෼෍ϞσϦϯάͰͷิशޮՌͷਪఆ݁Ռ) School posterior quantiles 2.5% 25% 50% 75% 97.5% A -2 6 11 16 34 B -5 4 8 12 21 C -14 2 7 11 21 D -6 4 8 12 21 E -9 1 6 9 17 F -9 3 7 10 19 G -1 6 10 15 26 H -8 4 8 13 26 38
  35. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ࣗ༝౓Λ ν

    = 4 ʹݻఆͯ͠ΪϒεαϯϓϦϯά ˠ A ͳͲͷ shrinkage ͕গ͚ͩ͠·͠ʹͳ͍ͬͯΔ (͋΍͍͠) School posterior quantiles 2.5% 25% 50% 75% 97.5% A(ਖ਼ن෼෍Ϟσϧ) -2 7 10 16 31 A(ϩόετϞσϧ) -2 6 11 16 34 39
  36. Robust inference for the eight schools importance resampling ͷܭࢉ p0(θ,

    µ, τ|y) ͔Β S = 5000 ճαϯϓϦϯά ν = 4 ͱ͠ɼ֤ θ ͷ importance ratio p(θ, µ, τ|y) p0(θ, µ, τ|y) = p(µ, τ|ν)p(θ|µ, τ, ν)p(y|θ, µ, τ, ν) p(y) p0(µ, τ)p0(θ|µ, τ)p0(y|θ, µ, τ) p0(y) ∝ p(θ|µ, τ, ν) p0(θ|µ, τ) = 8 ∏ j=1 tν(θj|µ, τ) N(θj|µ, τ) (17.5) ʹج͍ͮͯ k = 500 ͷαϒαϯϓϧΛੜ੒ 40
  37. Robust inference for the eight schools importance resampling ͷܭࢉ •

    ࠓճͷϩόετੑͷධՁͰ͸े෼ͳۙࣅ • ߴਫ਼౓ͳਪ࿦͕ඞཁͳ৔߹ͩͱ͖ͼ͍͠ ˠର਺ importance ratio ͷ෼෍ͷ͕͓͔ͦ͢͠ͳ͜ͱʹͳͬͨͦ͏ t4 ෼෍ͱਖ਼ن෼෍ͱͷͦ͢ͷߴ͞ͷࠩͷӨڹʁ 41
  38. Robust inference for the eight schools ν ͷมԽʹ൐͏Ϟσϧͷײ౓ͷղੳ (ਤ 17.1)

    • ν = 1, 2, 3(, 4), 5, 10, 30(, ∞) Ͱಉ༷ͷ MCMC γϛϡϨʔ γϣϯΛճͨ݁͠Ռ • ν ͷมԽʹΑΔ໌Β͔ͳޮՌ͸ͳͦ͞͏ • ν → ∞ ΑΓ ν = 30 ͷ΄͏͕ϩόετੑ͕௿ͦ͏ͳͷ͸ ͳͥʁ • ν = 30 Ͱฏۉ΋ඪ४ภࠩ΋ॖখ͍ͯ͠Δͷ͸ͳͥʁ 42
  39. Robust inference for the eight schools ν Λະ஌ύϥϝʔλͱͯ͠ѻ͏ ਤ 17.1

    ΛݟΔݶΓɼਪఆ஋ͷ ν ΁ͷґଘੑ͸௿ͦ͏ ˠपลࣄޙ෼෍ʹΑΔղੳ (importance weighting) ͸ෆཁ 1 ν ∈ (0, 1] ʹ͍ͭͯҰఆͷࣄલ෼෍Λઃఆ ৚݅෇͖ࣄલ෼෍ p(µ, τ|ν) ∝ 1 ͕ improper ͳͷͰ p(µ, τ|ν) ∝ g(ν) ͱࣄલ෼෍ͷ ν ґଘੑΛදه ν ≤ 2 Ͱෆ౎߹ͳͷͰ µ, τ ΛͦΕͧΕதԝ஋ͱ࢛෼Ґ۠ؒͱͯ͠ ѻ͍ɼg(ν) ∝ 1 ͱ͢Δ 43
  40. Robust inference for the eight schools ν Λະ஌ύϥϝʔλͱͯ͠ѻ͏ • ࣄޙ෼෍΁ͷ

    ν ͷӨڹ͸େ͖͘ͳ͍ (ਤ 17.1ɼਤ 17.2) • ν ΁ͷґଘੑ͕΋ͬͱେ͖͚Ε͹ɼ(µ, τ, ν) ʹ͍ͭͯͷແ৘ ใࣄલ෼෍Λݕ౼͢Δ 44
  41. Robust inference for the eight schools ߟ࡯ • தԝ஋ɼ50%͓Αͼ 95%۠ؒͰ͸ਖ਼ن෼෍ϞσϧͰ΋

    insensitive ͩͬͨ • ͔͠͠ 99.9%۠ؒʹͳΔͱ෼෍ͷͦ͢ʹେ͖͘ґଘ͢Δͷ Ͱɼν ͷӨڹΛड͚΍͍͔͢΋ ˠࠓճͷྫͰ͸͋·Γؔ৺͸ͳ͍ 45
  42. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ ௨ৗͷઢܗճؼϞσϧͰ͸֎Ε஋ͷӨڹ͕େ͖͍ ˠޡ߲͕ࠩࣗ༝౓ͷখ͍͞

    t ෼෍ʹ͕ͨ͠͏ͱԾఆ Ԡ౴ม਺ yi ͕આ໌ม਺ Xi ʹΑͬͯ p(yi|Xi, β, σ2) = tν(yi|Xiβ, σ2) ͱ༩͑ΒΕΔ ͜Ε·Ͱͱಉ༷ɼtν Λਖ਼ن෼෍ͱई౓෇͖ٯΧΠೋ৐෼෍Ͱߟ ͑Δ    yi ∼ N(yi|Xiβ, Vi) Vi ∼ Inv-χ2(Vi|ν, σ2) 46
  43. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ • p(β,

    σ|ν, y) ͷࣄޙ࠷ස஋ΛΈ͚ͭΔ • p(β, logσ|ν, y) ∝ 1 ͷແ৘ใࣄલ෼෍ΛԾఆ ˠࣄޙ࠷ස஋͸χϡʔτϯ๏ͳͲͰ௚઀ٻ·Δ • ผͷํ๏ͱͯ͠ɼVi Λʮܽଌʯͱࢥ͏ͱ EM ΞϧΰϦζϜͷ ࿮૊ΈͰѻ͑Δ 47
  44. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ E εςοϓ

    ਖ਼ن෼෍Ϟσϧͷे෼౷ܭྔͷظ଴஋ΛٻΊΔ ࠓճ͸ ∑ n i=1 1 Vi ʹ͍ͭͯߟ͑Δ Vi|yi,βold,σold,ν ∼ Inv-χ2   Vi ν+1, ν(σold)2 + (yi − Xiβold)2 ν + 1    (17.6) E ( 1 Vi |yi,βold,σold,ν ) = ν + 1 ν(σold)2 + (yi − Xiβold)2 48
  45. Robust regression using t-distributed errors ิ଍: (17.6) ࣜͷ࣍ͷࣜͷಋग़ E (

    1 Vi yi, βold, σold, ν ) ͸ม਺ม׵ʹΑͬͯٻΊΒΕΔ: x ∼ Inv-χ2(x|ν, s2) ͷͱ͖ɼy = 1 x ͷ͕ͨ͠͏֬཰෼෍͸ y ∼ Gamma ( y ν 2 , νs2 2 ) Ͱ͋Γɼ͜ΕΛ༻͍Δͱ 1 Vi ͷظ଴஋͕ٻ·Δ 49
  46. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ M εςοϓ

    W ͸ର֯੒෼͕ Wi,i = 1 Vi ͷର֯ॏΈߦྻ      ˆ βnew = (XT WX)−1XT Wy (ˆ σnew)2 = 1 n (y − X ˆ βnew)T W(y − X ˆ βnew) 50
  47. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ ͜ͷ EM

    ͷ൓෮͸൓෮ॏΈ෇͖࠷খೋ৐๏ͱ౳Ձ 1. ύϥϝʔλਪఆྔͷॳظ஋͕༩͑ΒΕ 2. ͦΕͧΕͷઆ໌ม਺ɾԠ౴ม਺ʹ͍ͭͯɼ࢒͕ࠩେ͖͚Ε͹ ॏΈΛখ͘͢͞Δ ν Λະ஌ͱ͢ΔͳΒ ECME ΞϧΰϦζϜΛ࢖͏ ˠࣗ༝౓Λߋ৽͢ΔεςοϓΛ௥Ճ 51
  48. Robust regression using t-distributed errors ΪϒεαϯϓϥʔͱϝτϩϙϦε๏ ࣄޙ෼෍͔ΒͷαϯϓϦϯά͸Ϊϒεαϯϓϥʔ΍ϝτϩϙϦε ๏Ͱ࣮ݱՄೳ ύϥϝʔλಉ࣌෼෍ p(β,

    σ2, V1, . . . , Vn|ν, y) ͔ΒͷαϯϓϦϯά 1. p(β, σ2|V1, . . . , Vnν, y) 2. p(V1, . . . , Vn|β, σ2ν, y) Vi|β, σ, ν, y ∼ Inv-χ2(Vi|ν, σ2) 3. ν ͕ະ஌ͳΒϝτϩϙϦε๏Ͱ ν Λಈ͔͢ ν ͕খ͍͞ͱ͖͸࠷ස஋͕ϚϧνϞʔμϧʹͳΓ͕ͪͰͭΒ͍ ˠॳظ஋Λ͍Ζ͍Ζม͑ͯγϛϡϨʔγϣϯ͢Δͷ͕େࣄ 52
  49. References I [1] Andrew Gelman, John B. Carlin, Hal S.

    Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis Third Edition. CRC Press, 2014. [2] ਢࢁರࢤ, ϕΠζਪ࿦ʹΑΔػցֶशೖ໳. ߨஊࣾ, 2017.