Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
13章: Sparse linear models 補助資料
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Daisuke Yoneoka
November 14, 2023
Research
75
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
13章: Sparse linear models 補助資料
Daisuke Yoneoka
November 14, 2023
More Decks by Daisuke Yoneoka
See All by Daisuke Yoneoka
感染症の数理モデル15
kingqwert
0
88
感染症の数理モデル14
kingqwert
0
150
感染症の数理モデル13
kingqwert
0
69
感染症の数理モデル12
kingqwert
0
140
感染症の数理モデル11
kingqwert
0
140
感染症の数理セミナー_10_.pdf
kingqwert
0
170
感染症の数理モデル9
kingqwert
0
130
感染症の数理モデル8
kingqwert
0
140
感染症の数理モデル7
kingqwert
0
140
Other Decks in Research
See All in Research
LiDAR点群の地表面分類手法の比較・検証
vegapunkhiroshi79
0
120
非試合日の野球場を楽しむためのARホームランボールキャッチ体験システムの開発 / EC79-miyazaki
yumulab
0
230
Claude Code × autoresearch 実践
mathbullet
0
160
Anthropic が提案する LLM の内部状態を自然言語で説明可能にした Natural Language Autoencoders / Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
shunk031
0
130
Φ-Sat-2のAutoEncoderによる情報圧縮系論文
satai
4
780
Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
satai
3
830
「AIとWhyを深堀る」をAIと深堀る
iflection
0
490
Cross-Media Information Spaces and Architectures
signer
PRO
0
300
NLP colloquium: AI Safety Survey
kanekomasahiro
0
730
typst の使い方:言語学を研究する学生のために
gitomochang
0
460
ブレグマン距離最小化に基づくリース表現量推定:バイアス除去学習の統一理論
masakat0
0
280
オーストリア流 都市の公共交通サービス水準評価@公共交通オープンデータ最前線2026
trafficbrain
0
190
Featured
See All Featured
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
580
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
65
55k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Embracing the Ebb and Flow
colly
88
5.1k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
590
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Building the Perfect Custom Keyboard
takai
2
790
Writing Fast Ruby
sferik
630
63k
Bash Introduction
62gerente
615
220k
Navigating Team Friction
lara
192
16k
Done Done
chrislema
186
16k
Become a Pro
speakerdeck
PRO
31
6k
Transcript
Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ Daisuke
Yoneoka September 26, 2014 Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 1 / 14
Notations γ bit vector Ͱ, ಛྔ j ͕ؔ࿈͋Δ߹ γj
= 1, ͦΕҎ֎ 0. ∥γ∥0 = D j=1 γj l0 pseudo-norm. ∥γ∥1 = D j=1 |γj| l1 norm. ∥γ∥2 = ( D j=1 γ2 j )1/2 l2 norm. π0: ͋Δಛྔ͕ؔ࿈͍ͯ͠Δ֬ xor: exclusive or / exclusive disjunction Ͱഉଞతཧͷҙ. ೖྗͷ͏ͪʮਅʯͷ͕حݸͳΒग़ྗ͕ਅʹͳΓ, ۮݸͷ߹ग़ྗ͕ʮِʯʹͳΔΑ͏ ͳԋࢉͷ͜ͱ. .*: ྻͷࢉ. A. ∗ B , ྻ A ͱྻ B ͷཁૉ͝ͱͷੵ (ͳ͔ͥ·ͨٸʹ matlab ͷॻ͖ํ) x:,j: ߦྻ X ͷ j ྻͷίϥϜϕΫτϧ (·ͨ,matlab ํݴ) subderivative (ྼඍ): ತؔ f : I → R ͷ θ0 Ͱྼඍͱ ,f(θ) − f(θ0) ≥ g(θ − θ0) θ ∈ I Λຬ͢Δ g ͷू߹ NLL: negative log likelihood, NLL(θ) ≡ − N i=1 log p(yi|xi, θ) Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 2 / 14
Introduction ಛྔબͰ, p(y|X) = p(y|f(wT X)) Ͱ w Λ sparse
ʹͱΔ͜ͱΛߟ͑Δ. ۙ, ू·ͬͯΔʂ Lots of computational advantages D >> N , (ݹయత౷ܭ D < N) D:ύϥϝʔλ࣍ݩ, n:αϯϓϧαΠζ ҨࢠղੳͰ d ∼ 10, 000 ͱ n ∼ 100 Ͱ, ͳΔ͘খ͍͞ಛྔͷηοτΛൃݟ͍ͨ͠ Ch.14 ͰΧʔωϧΛ༻͍ͨղੳΛѻ͏. ͜ͷͱ͖ܭըߦྻ N × N Ͱ ݁ہ, ಛྔબ=܇࿅σʔλͷαϒηοτΛબͿ͜ͱʹͳΔ.(Sparse kernel machine) ৴߸ॲཧͰ,wavelet Λجఈͱͯ͠දݱ͢Δ͕, ͜ͷجఈΛͳΔ͘গͳ͘બͿͱ͖ʹ༻ Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 3 / 14
Bayesian variable selection ಛྔͷΈ߹Θͤͷ Posterior ΛٻΊ͍ͨ. p(γ|D) = e−f(γ) γ′
e−f(γ′ ) ͨͩ͠ f(γ) ≡ −[log p(D|γ) + log p(γ)] ͜ΕϞσϧ͕ଟ͘ͳΔͱͪΐ ͬͱղऍ͕͘͠ͳΔ Summary stats Λߟ͑ͯΈΔͱ, ࣗવʹ Posterior ͷ mode=MAP ਪఆྔ͕ࢥ͍ͭ͘. ˆ γ = argmax p(γ|D) = argmin f(γ) Mode ͪΐ ͬͱΞϨ...median ˆ γ = {j : p(γj = 1|D) = 0.5} ͨͩ͜͠Ε, posterior marginal inclusion probability, p(γj = 1|D) ͷܭࢉ͕ඞཁ Ͱ͜Ε࣍ݩ͕͕͋Δͱݫ͘͠ͳͬͯ͘Δ Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 4 / 14
Spike and slab model Posterior p(γ|D) ∝ p(γ)p(D|γ) Prior
p(γ) = D j=1 Ber(γi |π0) = π∥γ∥0 0 (1 − π0)D−∥γ∥0 . Likelihood p(D|γ) = p(y|X, γ) = p(y|X, w, γ)p(w|γ, σ2)p(σ2)dwdσ2 p(w|γ, σ2) ͷ prior p(wj |γj , σ2) = δ0 (wj ) if γj = 0 N(wj |0, σ2σ2 j ) if γj = 1 . ͨͩ͠, x ͱ y standardized. ࠷ॳͷ, ݪʹ spike ཱ͕͍ͬͯΔײ͡ As σw → ∞ Ͱ p(wj|γj) uniform ʹͳΔͷͰ slab ͱݴ͑Δ. Zou. 2007. Marginal likelihood Λ BIC ͰۙࣅͰ͖Δ. log p(D|γ) ≈ log p(y|X, ˆ wγ , ˆ σ2) − ∥γ∥0 2 ࣗ͜͜༝ log N ্ΑΓ, log p(γ|D) ≈ log p(y|X, ˆ wγ , ˆ σ2) − ∥γ∥0 2 log N − λ∥γ∥0 p(γ) ͷ prior +const Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 5 / 14
From the Bernoulli-Gaussian model to l0 regularization yi |xi, w,
γ, σ2 ∼ N( γjwjxij, σ2) γj ∼ Ber(π0 ) wj ∼ N(0, σ2 w ) Called Bernoulli-Gaussian model or binary mask model. (γj ͕ wj Λ mask out ͯ͠ ͍Δ) binary mask: γj → y ← wj vs slab: γj → wj → y γj ͱ wj ͷࣝผੑ͕ͳ͘ γj wj ͔ࣝ͠ผෆՄ ͪΐ ͬͱ͍͍͜ͱ͋ΔΑʂNon-Bayeisan ʹ׳Ε͠·Ε͍ͯΔײ͡ʹͳΔ Joint prior p(γ, w) ∝ N(0, σ2 w )π∥γ∥0 0 (1 − π0)D−∥γ∥0 ͜͏͢Δͱ log posterior f(γ, w) ≡ −2σ2 log p(γ, w, y|X) = ∥y − X(γ. ∗ w)∥2 + σ2 σ2 w ∥w∥2 + λ∥γ∥0 + const, ͨͩ͠ γ ≡ 2σ2 log( 1 − π0 π0 ). w−γ = 0 ͱ wγ Λ γ ͕ 0 or 1 ͷͱ͖ͷ w ͱ͠,σ2 w → ∞ Ͱ f(γ, w) = ∥y − Xγwγ∥2 2 + λ∥γ∥0. ɹ͜Εͬͯ, ্ͷ BIC ͷࣜʹࣅ͍ͯΔΑͶ l0 regularization; γ ͏ͷΊͯ,support ͱͯ͠ w ͷॏཁੑΛද͢มΛఆٛ͢Δ͜ ͱͰ f(w) = ∥y − Xw∥2 2 + λ∥w∥0 ͜ΕͰ࠷దԽΛ γ ∈ {0, 1} ͷ͔̎Β࿈ଓ w ʹมՄೳ. Ͱ, ୈೋ߲·ͩ·ͩ ࠷దԽ͠ʹ͍͘! Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 6 / 14
ΞϧΰϦζϜ γ bit vector ͳͷͰ, શࠪେม ˠͪΐ ͬͱ heuristic
ʹ Wrapper method: Ϟσϧͷͳ͔Ͱίετ f(γ) (ΤϥʔͳͲ) Λܭࢉ͠ͳ͕Β argmaxp(D|w) p(D|w)p(w)dw Λܭࢉ͢Δख๏. ײతʹ, ֶशΞϧΰϦζϜ Λ࣮ͨؔ͠ fun Λ wrap ͯ͠, σʔλΛ subset ʹׂͯ͠είΞΛܭࢉ͠ͳ͕Β ద༻͍ͯ͘͠ख๏ ޮԽͷϙΠϯτ ͍͔ʹͯ͠લͷ γ ͷͱ͖ͷείΞΛߋ৽ͯ͠ γ′ ͷ߹ͷείΞΛܭࢉ͢Δ͔ ⇔ ίετ f(γ) ͷे౷ܭྔΛޮతʹߋ৽ ⇔ f(γ) ͷܭࢉΛ Xγ ͚ͩʹґଘ্ͤͨ͞Ͱ,γ Λগ͠ߋ৽ͯ͠ γ′ ʹ͢Δ (ҰͭͷมΛग़ ͠ೖΕ͢Δ) ͜ͷͱ͖,QR ղͰ XT γ Xγ Λ XT γ′ Xγ′ ʹߋ৽Ͱ͖Δ Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 7 / 14
Greedy search 1 l0 regularization ͷతؔͷ࠷దԽΛΛߟ͑Δ ೋ๏ͷੑ࣭Λར༻Մೳ. (See detail; Miller
2002; Soussen et al. 2010) Single best replacement (SBR): Greedy hill climbing ʹ͓͍ͯ γ Λগ͠ৼΒͤΔ͚ͩͰ౸ ୡՄೳͳۙϞσϧΛ୳ࡧ͢Δ͜ͱ. Sparse ͳղΛݟ͚ͭΔ͜ͱΛతʹ͍ͯ͠ΔͷͰ, ॳ ظ γ = 0. ͋ͱείΞͷྑ͍ղ͕ݟ͔ͭΔ·Ͱग़͠ೖΕ. Orthogonal least squares: ͠,λ = 0 (i.e., prior p(γ) ΛೖΕΔ͜ͱͷേଇ͕ͳ͍ঢ়ଶ) ͱ ͢Δͱ,forward ʹมՃ͚ͩͰ OK. ͜ͷͱ͖ Orthogonal least square, ·ͨ,greedy forward selection ͱݺͿ. Τϥʔ ∥γ∥0 ͷ୯ௐݮগؔͱͳΔ. ߋ৽ࣜ γ(t+1) = γ(t) ∪ {j∗}. ͨͩ͠,j∗ = argminj/ ∈γt minw∥y − (Xγj ∪jw)∥2 Orthogonal matching pursuits (OMP): ্ͷํ๏ߴՁ. ؆ུԽͨ͠ͷ͕͜Ε. j∗ = argminj/ ∈γt minβ∥y − (Xwt − βx:,j)∥2 Λղ͘͜ͱͰ࣍ͷީิൃݟ (wt ݻఆ͞Ε ͍ͯΔ). ͜ΕͷղॠࡴͰ β = xT :,j (y − Xwt) xT :,j x:,j . ͜Ε,wt Λݻఆͨ͠ͱ͖ͷࠩ y − Xwt ͱ࠷૬ؔ͢ΔίϥϜ x:,j Λબ͢Δ͜ͱʹ૬. ͜ΕͰ৽͍͠ಛྔͷΛͭ ͬͯ͘ wt+1 Λܭࢉ. Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 8 / 14
Greedy search 2 ͖ͭͮ Matching pursuits: sparse boosting (least squares
boosting) ͱҰॹ. 16 ষͰΓ· ͠ΐ͏. Backwards selection: saturated model (Ϟσϧ) ΑΓ࢝Ίͯ, ঃʑʹݮΒ͢ํ๏. ͜Ε Ұൠʹ forward selection ΑΓྑ͍݁ՌΛͨΒ͢. ͳͥͳΒ, औࣺબͷܾఆ͕ͦͷଞ ͷશม͕ґଘ͍ͯ͠Δͱ͍͏ԾఆͰߦΘΕΔ͔Β. FoBa: forward-backward algorithm ͷҙ. SBR ͱࣅ͍ͯΔ͕࣍ͷީิΛબͿࡍʹ OMP ͷ Α͏ʹબͿ͕ಛ Bayesian matching pursuit: OMP ͱࣅ͍ͯΔ͕, ೋޡࠩΛతؔʹ͢ΔͷͰͳ ͘,bayesian marginal likelihood scoring criterion Λ͏͕ಛ. ϏʔϜαʔν (ذ͕ ϏʔϜ෯ (ࣄલઃఆ) ΑΓ͘ͳͬͨ߹ʹ, ѱ͍ࢬΛמΔ) Λ͏. Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 9 / 14
Stochastic search ۙʹҠಈ͢Δͱ͖ʹ best ͳͷʹҠಈ (Greedy search) Ͱͳ͘, ֬తʹҠಈ ઌΛબ͢Δख๏
Posterior ࣗମΛܭࢉ͍ͨ͠߹,MCMC Ͱ͠ΐ. ఏҊ γ Λগ͚ͩ͠มԽͤͨ͞ͷͳͷͰ,p(γ′|D) Λ p(γ|D) ͔Β࡞Δ͜ͱൺ ֱత༰қ. (See detail for O ʟ Hara and Sillanpaa 2009) ࢄͳঢ়ଶۭؒͰ,MCMC ඞͣ͠ඇޮͰͳ͍. ͳͥͳ Β,p(γ′) = exp(−f(γ)) Ͱ͕֬ܭࢉՄೳ͔ͩΒ (ಉ͡ঢ়ଶʹΔඞཁ͕ͳ͘ͳΔ ͷͰޮ up). ߋʹޮΛ͋͛ΔͨΊʹ, ߴείΞͷϞσϧ S Λ࡞Γ,p(γ|D) ≈ e−f(γ) γ′∈S e−f(γ′) Ͱ posterior Λۙࣅ͢Δ. (Heaton and Scott 2009) Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 10 / 14
EM and variational inference EM ΞϧΰϦζϜͰ Slab model (γj →
wj → y) Λਪఆͯ͠ΈΔ E step: p(γj|wj)? M step: w ʹ͍ͭͯ࠷దԽ? ͜ΕͰಈ͔ͳ͍!ͳͥͳΒ,(13.11) ͷதͷ δ0(wj) ͱ N(wj|0, σ2 w ) ͕ൺֱෆՄೳ → δ0(wj) ΛΨγΞϯͰۙࣅͰղܾ. (local minima ͷ͕Δ) EM ΞϧΰϦζϜͰ Bernoulli-Gaussian model (γj → y ← wj ) Λਪఆͯ͠ΈΔ Posterior p(γ|D, w) ܭࢉ͠ʹ͍͘ ͔͠͠, ͜ͷฏۉۙࣅ j q(γj)q(wj) Λܭࢉ͢Δ͜ͱՄೳ (Huang et al. 2007; Rattray et al. 2009) Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 11 / 14
l1 regularization: basics l0 (i.e.∥w∥0) ತؔͰͳ͍, ࿈ଓͰͳ͍! → ತؔۙࣅ! p(γ|D)
ΛٻΊΔ͜ͱͷ͠͞ͷ͍͘Β͔ γ ∈ {0, 1} ͱࢄͰ͋Δ͜ͱ Prior p(w) Λ࿈ଓͳ (ϥϓϥε) Ͱۙࣅ͢Δ. p(w|λ) = D j=1 Lap(wj |0, 1/λ) ∝ D j=1 e−λ∥wj ∥ േଇ͖ f(w) = log p(D|w) − log p(w|λ) = NLL(w) + λ∥w∥1 . ͜Ε argminw NLL(w) + λ∥w∥0 ͱ͍͏ non-convex ͳ l0 ͷతؔͷತؔۙࣅ ͱߟ͑ΒΕΔ Linear regression ͷ߹ (Known as BPDN (basis pursuit denoising)) f(w) = N i=1 − 1 2σ2 (yi − (wT xi))2 + λ∥w∥1 = RSS(w) + λ′∥w∥1 ͨͩ͠,λ′ = 2λσ2 Prior ʹ̌ฏۉϥϓϥεΛ͓͍ͯ,MAP ਪఆ͢Δ͜ͱΛ l1 ਖ਼ଇԽͱݺͿ Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 12 / 14
Why does l1 regularization yield sparse solutions? Linear regression ʹݶఆ͢Δ͕
GLM Ұൠʹ֦ுՄೳ తؔ minw RSS(w) + λ∥w∥1 ⇔ LASSO: minw RSS(w)s.t. λ∥w∥1 ≤ B B খˠ λ େ ͪͳΈʹ minw RSS(w) + λ∥w∥2 2 ⇔ RIDGE: minw RSS(w)s.t. λ∥w∥2 2 ≤ B Figure: 13.3; l1 (left) vs l2 (right) regularization Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 13 / 14
Optimality conditions for lasso Lasso non-smooth optimization (ඍෆՄೳ࠷దԽ) ͷྫ.
తؔ minwRSS(w) + λ∥w∥1 ୈҰ߲ͷඍ ∂ ∂wj RSS(w) = aj wj − cj . ͨͩ͠ aj = 2 n i=1 x2 ij , cj = 2 n i=1 xij(yi − wT −j xi,−j) j ͱ j ͳ͠ͷࠩͷੵ cj j ൪ͷಛྔ͕ y ͷ༧ଌʹͲΕ͚ͩؔ࿈͍ͯ͠Δ͔Λදݱ શମͷඍ ∂wj f(w) = (aj wj − cj ) + λ∂wj ∥w∥1 = ⎧ ⎪ ⎨ ⎪ ⎩ {ajwj − cj − λ} if wj < 0 [−cj − λ, −cj + λ] if wj = 0 {ajwj − cj + λ} if wj > 0 Daisuke Yoneoka Murphy: Machine learning 13 ষ Sparse linear models ɹิॿࢿྉ September 26, 2014 14 / 14