Slide 6
Slide 6 text
From the Bernoulli-Gaussian model to l0 regularization
yi
|xi, w, γ, σ2 ∼ N( γjwjxij, σ2)
γj
∼ Ber(π0
)
wj
∼ N(0, σ2
w
)
Called Bernoulli-Gaussian model or binary mask model. (γj
͕ wj
Λ mask out ͯ͠
͍Δ)
binary mask: γj
→ y ← wj
vs slab: γj
→ wj
→ y
γj
ͱ wj
ͷࣝผੑ͕ͳ͘ γj
wj
͔ࣝ͠ผෆՄ
ͪΐ
ͬͱ͍͍͜ͱ͋ΔΑʂNon-Bayeisan ʹ׳Ε͠·Ε͍ͯΔײ͡ʹͳΔ
Joint prior p(γ, w) ∝ N(0, σ2
w
)π∥γ∥0
0
(1 − π0)D−∥γ∥0
͜͏͢Δͱ log posterior
f(γ, w) ≡ −2σ2 log p(γ, w, y|X) = ∥y − X(γ. ∗ w)∥2 +
σ2
σ2
w
∥w∥2 + λ∥γ∥0 + const,
ͨͩ͠ γ ≡ 2σ2 log(
1 − π0
π0
).
w−γ = 0 ͱ wγ Λ γ ͕ 0 or 1 ͷͱ͖ͷ w ͱ͠,σ2
w
→ ∞ Ͱ
f(γ, w) = ∥y − Xγwγ∥2
2
+ λ∥γ∥0. ɹ͜Εͬͯ, ্ͷ BIC ͷࣜʹࣅ͍ͯΔΑͶ
l0
regularization; γ ͏ͷΊͯ,support ͱͯ͠ w ͷॏཁੑΛද͢มΛఆٛ͢Δ͜
ͱͰ f(w) = ∥y − Xw∥2
2
+ λ∥w∥0
͜ΕͰ࠷దԽΛ γ ∈ {0, 1} ͷ͔̎Β࿈ଓ w ʹมՄೳ. Ͱ, ୈೋ߲·ͩ·ͩ
࠷దԽ͠ʹ͍͘!
Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 6 / 14