Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sparse linear models

Sparse linear models

Daisuke Yoneoka

November 14, 2023
Tweet

More Decks by Daisuke Yoneoka

Other Decks in Research

Transcript

  1. Sparse linear models Daisuke Yoneoka October 20, 2014 Daisuke Yoneoka

    Sparse linear models October 20, 2014 1 / 21
  2. Notations γ ͸ bit vector Ͱ, ಛ௃ྔ j ͕ؔ࿈͋Δ৔߹͸ γj

    = 1, ͦΕҎ֎͸ 0. ∥γ∥0 = D j=1 γj ͸ l0 pseudo-norm. ∥γ∥1 = D j=1 |γj| ͸ l1 norm. ∥γ∥2 = ( D j=1 γ2 j )1/2 ͸ l2 norm. subderivative (ྼඍ෼): ತؔ਺ f : I → R ͷ θ0 Ͱྼඍ෼ͱ ͸,f(θ) − f(θ0) ≥ g(θ − θ0) θ ∈ I Λຬ଍͢Δ g ͷू߹ NLL: negative log likelihood, NLL(θ) ≡ − N i=1 log p(yi|xi, θ) Daisuke Yoneoka Sparse linear models October 20, 2014 2 / 21
  3. l1 regularization: basics l0 (i.e.∥w∥0) ͸ತؔ਺Ͱͳ͍, ࿈ଓͰ΋ͳ͍! → ತؔ਺ۙࣅ! p(γ|D)

    ΛٻΊΔ͜ͱͷ೉͠͞ͷ͍͘Β͔͸ γ ∈ {0, 1} ͱ཭ࢄͰ͋Δ͜ͱ Prior p(w) Λ࿈ଓͳ෼෍ (ϥϓϥε෼෍) Ͱۙࣅ͢Δ. p(w|λ) = D j=1 Lap(wj |0, 1/λ) ∝ D j=1 e−λ∥wj ∥ േଇ෇͖໬౓͸ f(w) = log p(D|w) − log p(w|λ) = NLL(w) + λ∥w∥1 . ͜Ε͸ argminw NLL(w) + λ∥w∥0 ͱ͍͏ non-convex ͳ l0 ͷ໨తؔ਺ͷತؔ਺ۙࣅ ͱߟ͑ΒΕΔ Linear regression ͷ৔߹ (Known as BPDN (basis pursuit denoising)) f(w) = N i=1 − 1 2σ2 (yi − (wT xi))2 + λ∥w∥1 = RSS(w) + λ′∥w∥1 ͨͩ͠,λ′ = 2λσ2 Prior ʹ̌ฏۉϥϓϥε෼෍Λ͓͍ͯ,MAP ਪఆ͢Δ͜ͱΛ l1 ਖ਼ଇԽͱݺͿ Daisuke Yoneoka Sparse linear models October 20, 2014 3 / 21
  4. ͳͥ l1 ਖ਼ଇԽ͸εύʔεͳͷ͔ʁ Linear regression ʹݶఆ͢Δ͕ GLM Ұൠʹ֦ுՄೳ ໨తؔ਺͸ minwRSS(w)

    + λ∥w∥1 ⇔ LASSO: minwRSS(w)s.t. λ∥w∥1 ≤ B B খˠ λ େ ͜Ε͸, Quadratic program (QP) ͱͳ͍ͬͯΔ. ͪͳΈʹ minw RSS(w) + λ∥w∥2 2 ⇔ RIDGE: minw RSS(w)s.t. λ∥w∥2 2 ≤ B Figure: 13.3; l1 (left) vs l2 (right) regularization Daisuke Yoneoka Sparse linear models October 20, 2014 4 / 21
  5. Optimality conditions for lasso Lasso ͸ non-smooth optimization (ඍ෼ෆՄೳ࠷దԽ) ͷྫ.

    ໨తؔ਺͸ minw RSS(w) + λ∥w∥1 ୈҰ߲ͷඍ෼͸ ∂ ∂wj RSS(w) = ajwj − cj . ͨͩ͠ aj = 2 n i=1 x2 ij , cj = 2 n i=1 xij(yi − wT −j xi,−j) j ͱ j ͳ͠ͷ࢒ࠩͷ಺ੵ cj ͸ j ൪໨ͷಛ௃ྔ͕ y ͷ༧ଌʹͲΕ͚ͩؔ࿈͍ͯ͠Δ͔Λදݱ શମͷྼඍ෼͸ ∂wj f(w) = (aj wj − cj ) + λ∂wj ∥w∥1 = ⎧ ⎪ ⎨ ⎪ ⎩ {ajwj − cj − λ} if wj < 0 [−cj − λ, −cj + λ] if wj = 0 {ajwj − cj + λ} if wj > 0 Matrix form Ͱॻ͘ͱ, XT (Xw − y)j RSS ͷඍ෼ͷ෦෼ ∈ ⎧ ⎪ ⎨ ⎪ ⎩ {−λ} if wj < 0 [−λ, λ] if wj = 0 {λ} if wj > 0 Daisuke Yoneoka Sparse linear models October 20, 2014 5 / 21
  6. Optimality conditions for lasso (Cont. 2) cj ͷ஋ʹΑͬͯ ∂wj f(w)

    = 0 ͷղͱͯ͠ఆٛ͞ΕΔ ˆ wj ͷ஋͸̏ύλʔϯ cj < −λ: ಛ௃ྔ͸࢒ࠩͱڧ͘ෛͷ૬ؔ, ྼඍ෼͸ ˆ wj = cj + λ aj < 0 ʹ͓͍ͯ 0. cj ∈ [−λ, λ]: ಛ௃ྔ͸࢒ࠩͱऑ͘૬ؔ, ྼඍ෼͸ ˆ wj = 0 ʹ͓͍ͯ 0. cj > −λ: ಛ௃ྔ͸࢒ࠩͱڧ͘૬ؔ, ྼඍ෼͸ ˆ wj = cj − λ aj > 0 ʹ͓͍ͯ 0. ͭ·Γɺ ˆ wj(cj) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ cj + λ aj if cj < −λ 0 if cj ∈ [−λ, λ] cj − λ aj if cj > λ ⇔ ˆ wj(cj) = soft( cj aj ; λ aj ) ͨͩ͠ɺsoft ͸ soft thresholding Ͱఆٛ͸ soft(a; δ) ≡ sign(a)(|a| − δ)+ Daisuke Yoneoka Sparse linear models October 20, 2014 6 / 21
  7. Optimality conditions for lasso (Cont. 3) LASSO (Tibshirani, 1996) ͸݁ہ,

    λ = 0 ͷͱ͖ ˆ w ͸ OLS ͱҰॹ λ > λmax ͷͱ͖ ˆ w = 0 (ͨͩ͠ λmax = ∥XT y∥∞ = max|yT x:,j |) ͜ͷܭࢉํ๏͸,(XT y)j ∈ [−λ, λ] ͳΒ͹ 0 ͕࠷దͰ͋Δ͜ͱΛར༻ Ұൠతʹ͸ λmax = max|∇jNLL(0)| Daisuke Yoneoka Sparse linear models October 20, 2014 7 / 21
  8. LS, lasso (l1), ridge (l2), subset selection (l0) ͷൺֱ X

    ͸ਖ਼ن௚ަ, ͭ·Γ XT X = I ΛԾఆ͓ͯ͘͠ͱ, RSS(w) = ∥y − Xw∥2 = yT y + wT XT Xw − 2wT XT y = const + k w2 k − 2 k i wk xi yi OLS ղ͸ ˆ wOLS k = xT :k y Ridge ղ͸ ˆ wridge k = ˆ wOLS k 1 + λ Lasso ղ͸ sign( ˆ wOLS k ) | ˆ wOLS k | − λ 2 + subset selection ղ͸ ˆ wSS k = ˆ wOLS k if rank(| ˆ wOLS k |) ≤ K 0 otherwise Daisuke Yoneoka Sparse linear models October 20, 2014 8 / 21
  9. ਖ਼ଇԽύε ಛ௃ྔ͝ͱʹ ˆ w(λ) ͱ λ ͷ஋Λϓϩοτͨ͠΋ͷ Lasso ͸ D

    > N ͷ৔߹Ͱ΋ N ·Ͱ͔͠ม਺બ୒Ͱ͖ͳ͍ Elastic net ͳΒ ͹ D ·Ͱͷ਺ͷม਺બ୒Մೳ Daisuke Yoneoka Sparse linear models October 20, 2014 9 / 21
  10. Ϟσϧબ୒ Ϟσϧબ୒ͷҰகੑʹ͍ͭͯ (cf. AIC, BIC, MDL ͳͲͷ৘ใྔج४࿦૪) ఆٛ: (ਖ਼͍͠Ϟσϧؚ͕·Ε͍ͯΔͱ͍͏લఏͷԼͰ) N

    → ∞ Ͱਖ਼͍͠Ϟσϧͷύϥϝʔληοτ ͕બ୒͞ΕΔ͜ͱ debiasing: Lasso Ͱ non-zero ͱਪఆ͞Εͨಛ௃ྔΛ༻͍ͯ࠶౓ OLS (ඞཁ. ͳͥͳΒ,Lasso Ͱ͸ؔ܎͋Δ܎਺΋ͳ͍΋ͷ΋ ॖখਪఆ͍ͯ͠Δ͔Β) ΫϩεόϦσʔγϣϯͰ༧ଌਫ਼౓Ͱ λ ܾఆ. ͜Ε͸,true ϞσϧΛબ୒Ͱ͖Δ஋ʹͳΔͱ͸ݶΒͳ͍. (ͳͥͳΒ Lasso ͸ॖখਪఆʹͳ͍ͬͯΔͷͰ, ॏཁ ͳಛ௃ྔΛ࢒ͨ͢Ίʹ͸ λ ͸গ͠େ͖ΊʹऔΔඞཁ͕͋Δ͔Β) ؔ܎ͳ͍ಛ௃ྔ΋ؚΊΔͷͰ false positive ͕ଟ͘ͳΔ Ϟσϧબ୒ͷҰகੑ͕ͳ͍! (Meinshausen, 2006) Ch.13.6.2 Ͱ per-dimension ʹΑΔ λ ͷνϡʔχϯάΛ঺ հ (બ୒ͷҰகੑ͋Γ) ܽ఺: σʔλ͕গ͠มΘ͚ͬͨͩͰ݁Ռ͕มΘΔ (Bayesian approach ͷํ͕ robust) Bolasso (Bach, 2008): Bootstrap Ͱղܾ: stability selection of inclusion probability (Meinshausen, 2010) Λܭࢉඞཁ Daisuke Yoneoka Sparse linear models October 20, 2014 10 / 21
  11. ϥϓϥε෼෍Λࣄલ෼෍ʹ࣋ͭ sparse linear model ͷ Bayes ਪଌ ͜Ε·Ͱͷྫ͸ॴҦ MAP ਪఆ

    posterior ͷ mode ͸ sparse ͕ͩ, mean ΍ median ͸ͦ͏Ͱͳ͍ posterior ͷ mean ΛೖΕͨ΄͏͕༧ଌೋ৐ޡࠩΛখ͘͞Ͱ͖Δ Elad, 2009 ͸ spike-slab model Ͱ posterior mean ͷํ͕༧ଌੑೳ͕͍͍ࣄΛূ໌ ͨͩ͠, ܭࢉྔ͸ߴՁ Daisuke Yoneoka Sparse linear models October 20, 2014 11 / 21
  12. l1 ਖ਼ଇԽͷΞϧΰϦζϜ ೋ৐ϩεؔ਺ͷ࠷దԽʹݶఆ͢Δ. (ͦͷଞͷϩεؔ਺΁΋֦ுՄೳ) Coordinate descent: Ұؾʹ࠷దԽͰͳ͘, ͦͷଞશͯΛ fix ͯ͠

    1 ͚ͭͩ࠷దԽ w∗ j = argminz f(w + zej ) − f(w) (z ͸ j ൪໨͕ 1 ͷ unit ϕΫτϧ) Ұ࣍ݩͷ࠷దԽ͕ղੳతʹղ͚Δ৔߹ʹ༗ޮ 1 ͔ͭͮͭ͠࠷దԽͰ͖ͳ͍ͷͰऩଋ͕஗͍ shooting ΞϧΰϦζϜ (Fu, 1998, Wu, 2008) (ex. logit ͷ৔߹͸ Yaun, 2010): Daisuke Yoneoka Sparse linear models October 20, 2014 12 / 21
  13. l1 ਖ਼ଇԽͷΞϧΰϦζϜ (Cont. 2) Active set ๏ Coordinate descent ͷز͔ͭ·ͱΊͯ࠷దԽ͢Δόʔδϣϯ

    ͨͩ͠, ͲΕΛݻఆ͠, ͲΕΛ update ͢Δ͔ܾఆ͠ͳ͚Ε͹ͳΒͳ͍ͷͰେม warm starting: ΋͠ λk ≈ λk−1 ͳΒ͹, ˆ w(λk) ͸ ˆ w(λk−1) ͔Β؆୯ʹܭࢉͰ͖Δ Ծʹ͋Δ஋ λ∗ ͷ࣌ͷղ͕஌Γ͍ͨͱ͢Δͱ,warm starting Λ࢖͏ͱ λmax ͔Β୳࢝͠Ίͯ λ∗ ·ͰࢸΔΞϧΰϦζϜͱͳΔ. (Continuation method or homotopy method) ͜Ε͸͍͖ͳΓ λ∗ Λܭࢉ͢Δ (cold starting) ΑΓ λ∗ ͕খ͍͞৔߹, ޮ཰తͳ৔߹͕ଟ͍! LARS (least angle regression and shrinkage): homotopy method ͷҰछ Step 1: λ ͸ y ͱ࠷΋ڧ͘૬ؔ͢Δ̍ͭͷಛ௃ྔ͚͔ͩΒܭࢉͰ͖Δ΋ͷΛॳظ஋ʹ͢Δ Step 2: rk = y − X:,Fk wk Ͱఆٛ͞ΕΔ࢒ࠩʹର͢Δ࠷ॳͷಛ௃ྔͱಉ͚ͩ͡ͷ૬ؔΛ΋ ͭ̎ͭ໨ͷಛ௃ྔ͕ݟ͔ͭΔ·Ͱ λ ΛݮΒ͍ͯ͘͠. (Fk ͸ k ൪໨ͷ active set) least angle Λߟ͑Δ͜ͱͰղੳతʹ࣍ͷ λ ΛܭࢉՄೳ Step 3: શͯͷม਺͕௥Ճ͞ΕΔ·Ͱ܁Γฦ͢ ͜ͷͱ͖,Lasso ͷ solution path Έ͍ͨͳ΋ͷΛඳͨ͘Ίʹ͸ಛ௃ྔΛ ʡ औΓআ͘ ʡ ͜ͱ͕ ՄೳͰ͋Δ͜ͱ͕ඞཁ LAR: LARS ʹࣅ͍ͯΔ͕ಛ௃ྔΛ ʡ औΓআ͘ ʡ ͜ͱΛڐ͞ͳ͍৔߹. (ͪΐ ͬͱ଎ ͘,OLS ͱಉ͡ίετͰ O(NDmin(N, D)) greedy forward search ΍ least square boosting ͱ΋ݺ͹ΕΔ Daisuke Yoneoka Sparse linear models October 20, 2014 13 / 21
  14. Proximal and gradient projection methods ತͳ໨తؔ਺ f(θ) = L(θ) +

    R(θ) Λߟ͑Δ. (L(θ) ͸ϩεؔ਺ͰತͰඍ෼Մೳ, R(θ) ͸ਖ਼ ଇԽ߲Ͱತ͕ͩඍ෼Մೳͱ͸ݶΒͳ͍) ྫ͑͹,f(θ) = R(θ) + 1/2∥θ − y∥2 2 ͷΑ͏ͳͱ͖ (L(θ) = RSS(θ) Ͱܭըߦྻ͕ X = I ͷͱ͖) ತؔ਺ R ͷ proximal operator ͷಋೖ: proxR (y) = argminz R(z) + 1/2∥z − y∥2 2 ௚ײతʹ͸ z Λ y ʹ͚ۙͮͳ͕Β R Λখ͍ͯ͘͘͞͠ iterative ͳ࠷దԽͷதͰ࢖͏৔߹͸, y Λ θk ʹͯ͠࢖͏ Ex. Lasso ໰୊ͷͱ͖ L(θ) = RSS(θ), R(θ) = IC (θ) ͱͰ͖Δ. (ͨͩ͠,C = θ : ∥θ∥1 ≤ B ͔ͭ IC (θ) ≡ 0 if θ ∈ C +∞ otherwise ) ҎԼ, ͲͷΑ͏ʹͯ͠ R ͷ proximal operator Λܭࢉ͢Δ͔Λݟ͍ͯ͘. Daisuke Yoneoka Sparse linear models October 20, 2014 14 / 21
  15. Proximal operator Proximal operator ͸ҎԼͷΑ͏ʹදݱՄೳ. (ܭࢉ࣌ؒ͸ O(D) (Duchi, 2008)) R(θ)

    = λ∥θ∥1 ͷͱ͖: proxR (θ) = soft(θ, λ) (soft-thresholding) R(θ) = λ∥θ∥0 ͷͱ͖: proxR (θ) = hard(θ, √ 2λ) (hard-thresholding) ͨͩ͠,hard(u, a) ≡ uI(|u| > a) R(θ) = IC (θ) ͷͱ͖: proxR (θ) = argminz∈C ∥z − θ∥2 2 = projC (θ) (C ΁ͷࣹӨ) C ͕௒ཱํମͷͱ͖ (i.e., C = θ : lj ≤ θj ≤ uj): projC (θ)j = ⎧ ⎪ ⎨ ⎪ ⎩ lj if θj ≤ lj θj if lj ≤ θj ≤ uj uj if θj ≥ uj C ͕௒ٿͷͱ͖ (i.e., C = θ : ∥θ∥2 ≤ 1): projC (θ)j = ⎧ ⎨ ⎩ θ ∥θ∥2 if ∥θ∥2 > 1 θ otherwise C ͕ 1-norm ٿͷͱ͖ (i.e., C = θ : ∥θ∥1 ≤ 1): projC (θ)j = soft(θ, λ) ͨͩ͠, λ ͸ ∥θ∥1 ≤ 1 ͷͱ͖ 0. ͦΕҎ֎ͷ࣌͸ j=1 −Dmax(|θj | − λ, 0) = 1 ͷղͰఆٛ ͞ΕΔ Daisuke Yoneoka Sparse linear models October 20, 2014 15 / 21
  16. Proximal gradient method Proximal operator ΛͲ͏΍ͬͯޯ഑๏ͷͳ͔Ͱ࢖͏͔Λࣔ͢. θ ͷߋ৽ΞϧΰϦζϜ͸ೋ࣍ۙࣅ θk+1 =

    argminz R(z) + L(θk) + gT k (z − θk) + 1 2tk ∥z − θk∥2 2 (ͨͩ͠,gk = ∇L(θk),tk ͸͜ͷԼ, ࠷ޙͷ߲͸ L ͷϔγΞϯͷۙࣅ ∇2L(θk) ≈ 1 tk I) ⇔ θk+1 = argminz tkR(z) + 1 2 ∥z − uk∥2 2 = proxtkR (uk). (where uk = θk − tkgk) R(θ) = 0 ͷͱ͖: gradient descent ͱ͓ͳ͡ R(θ) = IC (θ) ͷͱ͖: projected gradient descent ͱ͓ͳ͡ R(θ) = λ∥θ∥1 ͷͱ͖: iterative soft thresholding ͱ͓ͳ͡ tk ΋͘͠͸ αk = 1/tk ͷબͼํʹ͍ͭͯ αkI ͕ ∇2L(θ) ͷྑ͍ۙࣅʹͳ͍ͬͯΔͱԾఆ͢Δͱ,αk(θk − θk−1 ≈ gk − gk−1) ͕੒ཱ ͕ͨͬͯ͠ αk = argminα ∥α(θk − θk−1 − (gk − gk−1))∥2 2 = (θk − θk−1)T (gk − gk−1) (θk − θk−1)T (θk − θk−1) Λղ͚͹ྑ͍. (Barzilai-Borwein (BB) or Spectral stepsize) BB stepsize ͱ iterative soft thresholding ͱ homotopy method Λ߹ΘͤΔͱ BPDN (basis pursuit denoising) Λ଎͘ղ͚Δ (SpaRSA ΞϧΰϦζϜ) Daisuke Yoneoka Sparse linear models October 20, 2014 16 / 21
  17. Nesteov’s method θk ͷपΓͰ͸ͳ͘ผͷॴͰೋ࣍ۙࣅͯ͠΍Δͱ΋ͬͱ଎͍ proximal gradient descent ͕ಘ ΒΕΔ. θk+1

    = proxtkR (φk − tk gk ) gk = ∇L(φk ) φk = θk + k − 1 k + 2 (θk − θk−1 ) Nester’s method ͱ iterative soft thresholding ͱ homotopy method Λ߹ΘͤΔͱ BPDN (basis pursuit denoising) Λ଎͘ղ͚Δ. (FISTA ΞϧΰϦζϜ (fast iterative shrinkage thresholding algorithm)) Daisuke Yoneoka Sparse linear models October 20, 2014 17 / 21
  18. Lasso ͷ EM ΞϧΰϦζϜ Laplace ෼෍Λ Gaussian scale mixture (GSM)

    Ͱදݱ͢Δ. Lap(wj|0, 1/γ) = γ 2 e−γ|wj | = N(wj|0, τ2 j )Ga(τ2 j |1, γ2 2 )dτ2 j ͜ΕΛ༻͍Ε͹, ಉ࣌෼෍͸ p(y, w, τ, σ2|X) = N(y|Xw, σ2IN )N(w|0, Dτ )IG(σ2|aσ, bσ) ⎡ ⎣ j Ga(τ2 j |1, γ2/2) ⎤ ⎦ ∝ (σ2)−N/2 exp − 1 2σ2 ∥y − Xw∥2 2 |D−1/2 τ exp − 1 2 wT Dτ w (σ2)aσ+1 exp(−bσ/σ) j exp(− γ2 2 τ2 j ) ͨͩ͠,Dτ = diag(τ2 j ) Ͱ X ͸ඪ४Խ,y ͸ centered ͞Ε͍ͯΔͷͰ offset ߲͸ແࢹՄೳ. EM ΞϧΰϦζϜͰߟ͑Δ (Figueiredo, 2003) E step: τ2 j , σ2 Λਪఆ͢Δ M step: w ʹؔͯ͠࠷దԽ͢Δ ࣮͸͜ͷ ˆ w ͸ Lasso ਪఆྔͱಉ͡ʹͳΔ Daisuke Yoneoka Sparse linear models October 20, 2014 18 / 21
  19. Why EM? l1 ͷ MAP ਪఆͷΞϧΰϦζϜ͸୔ࢁ͋Δͷʹ, ͳΜͰ͋͑ͯ EM ͳͷ͔ʁ probit

    ΍ robust linear model ͳͲͷਪఆྔΛܭࢉ͠΍͍͢ ෼ࢄʹؔͯ͠ Ga(τ2 j |1, γ2/2) Ҏ֎ͷ prior ΋ߟ͑΍͍͢ Bayesian lasso Λ࢖͑͹ full posterior p(w|D) Λܭࢉ͠΍͍͢ Daisuke Yoneoka Sparse linear models October 20, 2014 19 / 21
  20. ໨తؔ਺, E/M step േଇ෇͖ର਺໬౓ؔ਺͸ lc(w) = − 1 2σ2 ∥y

    − Xw∥2 2 − 1 2 wT Λw + const. (ͨͩ͠,λ = diag( 1 τ2 j ) Ͱਫ਼౓ߦྻ) E step ·ͣ͸ E[ 1 τ2 j |wj] ͷܭࢉΛߟ͑Δ E[ 1 τ2 j |wj ] = − log N(wj |0, τ2 j )p(τ2 j )dτ2 j |wj | Λ௚઀ܭࢉ͢Δ ΋͘͠͸, p(1/τ2 j |w, D) = InverseGaussian γ2 w2 j , γ2 ͱ͢Δͱ,E[ 1 τ2 j |wj ] = γ |wj | ݁ہ,¯ Λ = diag(E[1/τ2 1 ], . . . , E[1/τ2 D ]) ࣍ʹ σ2 ͷਪఆΛߟ͑Δ. posterior ͸ p(σ2|D, w) = IG(aσ + (N)2, bσ + 1 2 (y − X ˆ w)T (y − X ˆ w)) = IG(aN , bN ) ͕ͨͬͯ͠ E[1/σ2] = an bN ≡ ¯ ω Daisuke Yoneoka Sparse linear models October 20, 2014 20 / 21
  21. ໨తؔ਺, E/M step (Cont.) M step ˆ w = argmaxw

    − 1 2 ¯ ω∥y − Xw∥2 2 − 1 2 wT Λw Λܭࢉ͍ͨ͠ ͜Ε͸Ψ΢γΞϯ prior ͷ΋ͱͰ MAP ਪఆ: ˆ w = (σ2 ¯ Λ + XT X)−1XT y ஫ҙ: Sparse ੑΛߟ͍͑ͯΔͷͰ wj ͷ΄ͱΜͲ͕ 0 ⇔ τ2 j ͷ΄ͱΜͲ͕ 0. ͜ͷͱ͖, ¯ Λ ͷٯߦྻͷܭࢉ͕ෆ҆ఆ SVD ෼ղ͕࢖͑Δ! (i.e., X = UDV T ): ˆ w = ΨV (V T ΨV + 1 ¯ ω D−2)−1D−1UT y ɹͨͩ͠, Ψ = ¯ Λ−1 = diag( 1 E[1/τ2 j ] ) = diag( |wj | − log N(wj |0, τ2 j )p(τ2 j )dτ2 j ) Note; Lasso ͷ໨తؔ਺͸ತͳͷͰৗʹ global optim ʹཧ࿦తʹ͸౸ୡՄೳ. ͕ͩ, ਺஋ܭࢉతʹෆՄೳͳ͜ͱ͕ଟ͍! ྫ͑͹,M step Ͱ ˆ wj = 0 ͱͨ͠ͱ͖, E step Ͱ͸ τ2 j = 0 ͱਪఆ݁͠Ռͱͯ͠ ˆ wj = 0 ͱͯ͠ ͠·͍, ͜ͷؒҧ͍͸मਖ਼ෆՄೳʹͳΔ! (Hunter, 2005) Daisuke Yoneoka Sparse linear models October 20, 2014 21 / 21