γ, σ2 ∼ N( γjwjxij, σ2) γj ∼ Ber(π0 ) wj ∼ N(0, σ2 w ) Called Bernoulli-Gaussian model or binary mask model. (γj ͕ wj Λ mask out ͯ͠ ͍Δ) binary mask: γj → y ← wj vs slab: γj → wj → y γj ͱ wj ͷࣝผੑ͕ͳ͘ γj wj ͔ࣝ͠ผෆՄ ͪΐ ͬͱ͍͍͜ͱ͋ΔΑʂNon-Bayeisan ʹ׳Ε͠·Ε͍ͯΔײ͡ʹͳΔ Joint prior p(γ, w) ∝ N(0, σ2 w )π∥γ∥0 0 (1 − π0)D−∥γ∥0 ͜͏͢Δͱ log posterior f(γ, w) ≡ −2σ2 log p(γ, w, y|X) = ∥y − X(γ. ∗ w)∥2 + σ2 σ2 w ∥w∥2 + λ∥γ∥0 + const, ͨͩ͠ γ ≡ 2σ2 log( 1 − π0 π0 ). w−γ = 0 ͱ wγ Λ γ ͕ 0 or 1 ͷͱ͖ͷ w ͱ͠,σ2 w → ∞ Ͱ f(γ, w) = ∥y − Xγwγ∥2 2 + λ∥γ∥0. ɹ͜Εͬͯ, ্ͷ BIC ͷࣜʹࣅ͍ͯΔΑͶ l0 regularization; γ ͏ͷΊͯ,support ͱͯ͠ w ͷॏཁੑΛද͢มΛఆٛ͢Δ͜ ͱͰ f(w) = ∥y − Xw∥2 2 + λ∥w∥0 ͜ΕͰ࠷దԽΛ γ ∈ {0, 1} ͷ͔̎Β࿈ଓ w ʹมՄೳ. Ͱ, ୈೋ߲·ͩ·ͩ ࠷దԽ͠ʹ͍͘! Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 6 / 14