Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ベイズ深層学習(5.1~5.2)
Search
catla
February 28, 2020
Science
0
220
ベイズ深層学習(5.1~5.2)
内容:ベイズニューラルネットワーク(5.1節),近似ベイズ推論の高速化(5.2節)
catla
February 28, 2020
Tweet
Share
More Decks by catla
See All by catla
ベイズ深層学習(6.3)
catla
2
220
ベイズ深層学習(6.2)
catla
3
230
[読み会資料] Federated Learning for Vision-and-Language Grounding Problems
catla
0
290
ベイズ深層学習(4.1)
catla
0
440
ベイズ深層学習(3.3~3.4)
catla
18
11k
ベイズ深層学習(2.2~2.4)
catla
6
1.3k
23回アルゴリズムコンテスト 1位解法
catla
6
670
Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV2019)の紹介
catla
0
570
TGS Salt Identification Challenge 12th place solution
catla
3
11k
Other Decks in Science
See All in Science
テンソル分解による糖尿病の組織特異的遺伝子発現の統合解析を用いた関連疾患の予測
tagtag
2
200
オンプレミス環境にKubernetesを構築する
koukimiura
0
290
研究って何だっけ / What is Research?
ks91
PRO
1
110
[第62回 CV勉強会@関東] Long-CLIP: Unlocking the Long-Text Capability of CLIP / kantoCV 62th ECCV 2024
lychee1223
1
960
局所保存性・相似変換対称性を満たす機械学習モデルによる数値流体力学
yellowshippo
1
290
ウェブ・ソーシャルメディア論文読み会 第25回: Differences in misinformation sharing can lead to politically asymmetric sanctions (Nature, 2024)
hkefka385
0
120
04_石井クンツ昌子_お茶の水女子大学理事_副学長_D_I社会実現へ向けて.pdf
sip3ristex
0
530
データベース02: データベースの概念
trycycle
PRO
2
780
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
130
データマイニング - ノードの中心性
trycycle
PRO
0
210
統計学入門講座 第3回スライド
techmathproject
0
110
モンテカルロDCF法による事業価値の算出(モンテカルロ法とベイズモデリング) / Business Valuation Using Monte Carlo DCF Method (Monte Carlo Simulation and Bayesian Modeling)
ikuma_w
0
200
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.5k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.6k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Adopting Sorbet at Scale
ufuk
77
9.5k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Agile that works and the tools we love
rasmusluckow
329
21k
Transcript
ϕΠζਂֶश d ܡɹঘً
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷ ۙࣅਪ๏
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹষͷۙࣅਪख๏ɼਂֶशϞσϧʹద༻Ͱ͖Δɽ ɹઢܗճؼϞσϧͱಉ༷ʹॱܕχϡʔϥϧωοτϫʔΫʢ//ʣΛϕΠζԽɽ ɹ ύϥϝʔλ ʹࣄલΛઃఆ͠ɼ֬తͳֶशͱ༧ଌΛՄೳʹ͢Δɽ ⟹ W ϕΠζਪʹ͓͚Δֶशͱ༧ଌ ύϥϝʔλͷಉ࣌ɿɹ
ͱදͤΔɽ ֶशɹɿɹ ΛධՁ͢Δɽ ༧ଌɹɿɹ ΛٻΊΔɽ p(Y, W|X) = p(W) N ∏ n=1 p(yn |w, xn ) p(W|X, Y) p(y* |x* , Y, X) n = 1,…, N xn yn W
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹઃఆ ɹɹೖྗσʔλ ɼ؍ଌσʔλ ͓Αͼύϥϝʔλͷಉ࣌ ΛҎԼͷΑ͏ʹ͓͘ɽ ɹɹ؍ଌσʔλɼҎԼͷ͔ΒಘΒΕΔͱԾఆ͢Δɽ
ɹɹ χϡʔϥϧωοτͷؔ ݻఆͷϊΠζύϥϝʔλɽ ɹɹύϥϝʔλɼҎԼͷ͔ΒಘΒΕΔͱઃఆ͢Δɽ ɹ ɹ ݻఆͷϊΠζύϥϝʔλɽ ɹ ɹɹ X = {x1 , …, xN } Y = {y1 , ⋯, yn } p(Y, W|X) = p(W) N ∏ n=1 p(yn |w, xn ) p(yn |xn , W) = (yn | f(xn ; W), σ2 y I) f(xn ; W) σ2 y p(w) = (w|0,σ2 w ) where w ∈ W σ2 w
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹಛ ɹɹ//ͷ͕Ͱ͋Δͱ͖ɼ ɹɹɹӅΕϢχοτ͕ଟ͍ɹ ɹؔෳࡶԽɽ ɹɹɹ ͕େ͖͍ɹ ɹมԽ͕ٸफ़ɽ ɹ ɹɹ
⟶ σw ⟶ ɹϕΠζ//ɼӅΕϢχοτΛ૿͢ͱɼࣄޙ͕ෳࡶʹͳ͍ͬͯ͘͜ͱ͕ ΒΕ͍ͯΔɽ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϥϓϥεۙࣅʹΑΔֶश ϥϓϥεۙࣅ p(Z|X) ≈ (Z|ZMAP , {Λ(ZMAP )} −1 )
Λ(Z) = − ∇2 Z log p(Z|X) ɹ؆୯ͷͨΊʹ//ͷग़ྗͷ࣍ݩΛͱ͢Δɽ ࣄޙͷۙࣅ ɹࣄޙͷ."1ਪఆΛٻΊΔɽ ɹɹ Ͱ࠷େΛऔΔύϥϝʔλ ΛٻΊΔɽ ɹࣄޙ࠷େԽɹʹɹରࣄޙ࠷େԽɹͳͷͰɼରࣄޙͷޯΛར༻͢Δ ͱɼҎԼͷΑ͏ͳ࠷దԽʹΑͬͯ."1ਪఆ͕ٻΊΒΕΔɽ ɹ ֶशɽ ⟹ p(W|Y, X) WMAP Wnew = Wold + α∇W log p(W|Y, X)| W=Wold α
ϥϓϥεۙࣅʹΑΔֶश ࣄޙͷۙࣅ ɹࣄޙͷޯɼҎԼͷΑ͏ʹٻΒΕΔɽɹɹɹ ɹɹɹɹɹɹɹɹɹɹ Αͬͯɼ ɹɹɹɹɹɹɹɹɹ ύϥϝʔλ Ͱภඍ͢ΔͱɼҎԼͷΑ͏ʹίετؔͷඍͱͳΔɽ
ɹɹɹɹɹɹɹɹɹ ɼͦΕͧΕ//ͷޡࠩؔͱ֤ύϥϝʔλͷࣄલʹ༝དྷ͢Δਖ਼ଇԽ ߲Ͱ͋Δɽ p(W|Y, X) = p(W)p(Y|X, W) p(X|Y) ∝ p(W)p(Y|X, W) log p(W|Y, X) = log p(Y|X, W) + log p(W) + c = N ∑ n=1 log p(yn |xn , W) + ∑ w∈W log p(w) + c w ∈ W ∂ ∂w log p(W|Y, X) = − { 1 σ2 y ∂ ∂w E(W) + 1 σ2 w ∂ ∂w ΩL2 (W) } E(W), ΩL2 (W)
ϥϓϥεۙࣅʹΑΔֶश ࣄޙͷۙࣅ ɹΑͬͯɼ."1ਪఆΛٻΊͨΒɼࣄޙΛҎԼͷΑ͏ʹۙࣅͰ͖Δɽ ɹɹɹɹɹɹɹɹɹɹ ޡࠩؔʹର͢ΔϔοηߦྻͰ͋Δɽ p(W|Y, X) ≈
q(W) = (W|WMAP , {Λ(WMAP )} −1 ) Λ(W) = − ∇2 W log p(W|Y, X) = 1 σ2 w I + 1 σ2 y H H
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹϥϓϥεۙࣅΛ༻͍Δͱɼ༧ଌҎԼͷΑ͏ʹۙࣅͰ͖Δɽ ɹ ɹ͔͠͠ɼ ͷதʹ//ؚ͕·Ε͍ͯΔͷͰɼղੳతܭࢉ͕ෆՄೳɽ ɹ͜͜Ͱɼύϥϝʔλͷࣄޙͷີ͕."1ਪఆͷपลʹूத͓ͯ͠Γɼ͔ͭͦͷ খ͞ͳൣғʹ͓͍ͯ ͕
ͷઢܕؔͰΑۙ͘ࣅͰ͖Δͱ͍͏ԾઆΛ͓͘ɽ͜ͷ Ծઆ͔Βɼςʔϥʔల։Ͱ ͷؔ Λ ·ΘΓͰ࣍ۙࣅ͢ΔͱɼҎԼͷΑ͏ ʹͳΔɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* , Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW p(y* |x* , W) f(x* |W) W W f(x* |W) WMAP f(x* ; W) ≈ f(x* ; WMAP ) + gT(W − WMAP ) g = ∇W f(x* ; W)| W=WMAP
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹΑͬͯɼ·ͱΊΔͱҎԼͷۙࣅ͕ࣜಘΒΕΔɽ ɹ ɹ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* ,
Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW = ∫ (yn | f(xn ; W), σ2 y )(W|WMAP , {Λ(WMAP )}−1)dW = ∫ (yn | f(x* ; WMAP ) + gT(W − WMAP ), σ2 y ) (W|WMAP , {Λ(WMAP )}−1)dW = (y* | f(x* ; WMAP ), σ2(x* )) σ2(x* ) = σ2 y + gT{Λ(WMAP )}−1g
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹΑͬͯɼ·ͱΊΔͱҎԼͷۙࣅ͕ࣜಘΒΕΔɽ ɹ ɹ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* ,
Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW = ∫ (yn | f(xn ; W), σ2 y )(W|WMAP , {Λ(WMAP )}−1)dW = ∫ (yn | f(x* ; WMAP ) + gT(W − WMAP ), σ2 y ) (W|WMAP , {Λ(WMAP )}−1)dW = (y* | f(x* ; WMAP ), σ2(x* )) σ2(x* ) = σ2 y + gT{Λ(WMAP )}−1g ϥϓϥεۙࣅ ςʔϥʔల։ͷҰ࣍ۙࣅ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ɹରࣄޙʢϋϛϧτχΞϯʹ͓͚ΔϙςϯγϟϧΤωϧΪʔʣ͕αϯϓϦϯά͠ ͍ͨมʹରͯ͠ඍՄೳͳΒ).$๏͕ద༻Ͱ͖Δɽܭࢉ࣌ؒ͑͞ेʹ֬อ͍ͯ͠Ε ɼཧతʹਅͷࣄޙ͔Βͷαϯϓϧ͕ಘΒΕΔʢ.$.$ͷಛʣɽ݁Ռతʹɼෳ ͷαϯϓϧ͔Βෆ࣮֬ੑΛදݱͰ͖Δɽ
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ॏΈύϥϝʔλͷਪ ɹਖ਼نԽ͞Ε͍ͯͳ͍ࣄޙΛར༻͢ΕɼରԠ͢ΔϙςϯγϟϧΤωϧΪʔҎԼ ͷΑ͏ʹͳΔɽ ͜ΕΛඍ͢Δͱɼઌ΄Ͳొͨ͠ίετؔͷඍͱՁͰ͋Δ͜ͱ͕Θ͔Δɽ ɹ ޡࠩٯ๏ʹΑΔޯܭࢉ͕ར༻Ͱ͖Δɽ ʲ.$.$ʹجͮ͘ͷۙࣅਪͷʳ
w αϯϓϧ͕ेͰ͋Δ͔ΛΔखஈ͕ͳ͍ɽ w .$.$ͷύϥϝʔλௐ͕͍͠ɽʢFH).$๏ʹ͓͚ΔεςοϓαΠζεςοϓͳͲ w ֶश͕ɽɹ (W) = − {log p(Y|X, W) + log p(W)} ⟹
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹϋΠύʔύϥϝʔλͰ͋Δ ʹͦΕͧΕࣄલΛ༩͑Δ͜ͱͰ ͱಉ࣌ʹ ਪՄೳͰ͋Δɽ ɹ ɹਫ਼ύϥϝʔλ Λಋೖ͠ɼҎԼͷΑ͏ʹࣄલΛΨϯϚͰఆٛ͢Δɽ
ɹಉ༷ʹ ʹରͯ͠ɼҎԼͷΑ͏ʹఆٛ͢Δɽ σw σy W γw = σ−2 w p(γw ) = Gam(γw |aw , bw ) (aw , bw ਖ਼ͷݻఆ) γy = σ−2 y p(γy ) = Gam(γy |ay , by ) (ay , by ਖ਼ͷݻఆ)
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹϞσϧʢύϥϝʔλͷಉ࣌ʣΛվΊͯॻ͘ͱɼҎԼͷΑ͏ʹͳΔɽ ɹ p(Y, W, γw , γy
|X) = p(γw )p(γy )p(W|γw ) N ∏ n=1 p(yn |xn , W, γy ) n = 1,…, N xn yn W γy γw ɹࣄޙɼҎԼͷΑ͏ʹͳΔɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(W, γw , γy |X, Y) αy βw βy αw
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹΪϒεαϯϓϦϯάΛ༻͍ͯɼ ΛαϯϓϦϯά͢Δɽ w ͷαϯϓϦϯά ɹɹɹઌ΄Ͳͱಉ༷ʹɼ).$๏Ͱαϯϓϧ͢Δɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ
w ͷαϯϓϦϯά ɹɹɹ ɹɹɹ Ψεɼ ΨϯϚʢΨεͷڞࣄલʣͳͷͰɼ ɹɹɹ ΨϯϚͰ͋ΔɽΑͬͯɼ ͨͩ͠ɼ ॏΈύϥϝʔλͷ૯ɽ W, γw , γy W W ∼ p(W|Y, X, γw , γy ) γw p(γw |Y, X, W, γy ) ∝ p(W|γw )p(γw ) p(W|γw ) p(γw ) p(γw |Y, X, W, γy ) γw ∼ Gam( ̂ aw , ̂ bw ) ̂ aw = aw + Kw 2 ̂ bw = bw + 1 2 ∑ w∈W w2 Kw
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ w ͷαϯϓϦϯά ɹɹɹ ɹɹɹ Ψεͷ૯ͳͷͰΨεɼ ΨϯϚΑΓɼ
ɹɹɹ ΨϯϚͰ͋ΔɽΑͬͯɼ γy p(γy |Y, X, W, γw ) ∝ p(γw ) N ∏ n=1 p(yn |xn , W, γr ) N ∏ n=1 p(yn |xn , W, γr ) p(γy ) p(γy |Y, X, W, γw ) γy ∼ Gam( ̂ ay , ̂ by ) ̂ ay = ay + N 2 ̂ by = by + 1 2 N ∑ n=1 {yn − f(xn ; W)}2
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹΨϯϚ ͷฏۉ ɼࢄ ͳͷͰɼ ͕େ͖͍΄Ͳ ʹΑΔ ͷਪఆਫ਼͕ѱ͘ɼ؍ଌʹର͢Δࢄ͕େ͖͘ͳΔΑ͏ʹֶश͞ΕΔɽ
ɹ ɹࠓճɼॏΈύϥϝʔλͷਫ਼ύϥϝʔλɼશମʹͬͯڞ௨ͷ Ͱ͓͍͍͕ͯͨɼ //ͷ֤͝ͱʹਫ਼ύϥϝʔλ ͱ͓͘͜ͱՄೳͰ͋Δɽ Gam(a, b) a/b a/b2 ̂ by f(xn |W) yn γw (γ(1) w , …, γ(L) w )
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ۙࣅϕΠζਪͷߴԽ
ۙࣅϕΠζਪͷߴԽ ʲϕΠζχϡʔϥϧωοτϫʔΫͷܽʳ ɹύϥϝʔλͷपลԽʹ͏ܭࢉྔ͕େ ɹɹ ༧ଌπʔϧͱͯ͋͠·ΓΘΕͳ͔ͬͨɽ ɹ·ͨɼਂֶशඞཁͳֶशσʔλ͕େ ɹɹ όονֶशΛલఏͱͨ͠ख๏Ͱܭࢉޮ͕ѱ͍ɽ ʲͲͷΑ͏ʹܽΛิ͏ʁʳ w
ੵআڈΛۙࣅਪ͢Δ͜ͱͰɼܭࢉͷޮΛ্͛Δɽ w ϛχόονֶशΛಋೖ͢Δɽ ⟹ ⟹
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲʳ ɹ.$.$Λར༻ֶͨ͠शେنͳσʔλʹରͯ͠ɼܭࢉޮ͕ѱ͍ɽ ʲղܾࡦʳ ɹܭࢉޮͷߴ͍ϛχόονʹجֶͮ͘शख๏ʢFH֬తޯ߱Լ๏ʣͱෆ࣮֬ੑͷ ਪఆ͕Մೳͳ.$.$ʢFH.)๏ɼ).$๏ʣΛΈ߹ΘͤΔɽ ɹ ֬తϚϧίϑ࿈ϞϯςΧϧϩ๏ ⟹
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹ֬తޯ߱Լ๏ͱϥϯδϡόϯಈྗֶ๏ΛΈ߹Θͤͨɹ֬తޯϥάδϡόϯ ಈྗֶ๏ɹΛར༻ֶͨ͠शΛߟ͑Δɽ ɹύϥϝʔλͷߋ৽Λɹ ͱද͢ɽ ɹ֬తޯ߱Լ๏Ͱɼύϥϝʔλͷߋ৽෯ΛҎԼͷΑ͏ʹॻ͚Δɽ ͨͩ͠ɼ
αϒαϯϓϧͷେ͖͞Ͱ͋ΓɼՃ͑ͯɼϩϏϯεɾϞϯϩʔΞϧΰϦζϜͷ Έʹ͢ΔͨΊʹɼεςοϓʹ͓͚Δֶश ҎԼͷ݅Λຬͨ͢Α͏ʹઃఆ͢ Δɽ Wnew = Wold + ΔW ΔW = αt 2 ∇W log p(W|Xs , Ys ) = αt 2 { N M ∑ n∈S ∇W log p(yn |xn , W) + ∇W log p(W) } M t αt ∞ ∑ i=1 αt = ∞, ∞ ∑ i=1 α2 t < ∞
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹҰํͰɼόονֶशΞϧΰϦζϜͷϥϯδϡόϯಈྗֶ๏ͷαϯϓϧΛಘΔͨΊʹඞ ཁͳεςοϓɼϙςϯγϟϧΤωϧΪʔΛ ɼεςοϓαΠζΛ ΛӡಈྔϕΫτϧͱ͢Δͱɼύϥϝʔλͷߋ৽෯ҎԼͷΑ͏ʹͳΔɽ
ɹ Λখ͘͢͞Εɼ.)๏ʹ͓͚Δड༰ΛݶΓͳ͘·Ͱ͚ۙͮΒΕΔɽ = − log p(W|X, Y) ϵ = αt p ΔW = − ϵ2 2 ∇W + ϵp = αt 2 ∇W log p(W|X, Y) + αt p = αt 2 { N ∑ n=1 ∇W log p(yn |xn , W) + ∇W log p(W) } + αt p, p ∼ (0, I) . αt
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹઌͷͭʢ֬తޯ߱Լ๏ͱϥϯδϡόϯಈྗֶ๏ʣΛΈ߹ΘͤΔͱɼߋ৽෯͕Ҏ ԼͷΑ͏ʹͳΔɽ ɹɹɹɹɹɹɹ ֶशɼઌ΄Ͳͷ݅ͱಉ༷ɽ ɹ ɹʬ͕খ͖͞ͱ͖ʢֶशॳظஈ֊ʣ㲊 ɹɹ4(%ͷརΛੜ͔ͯ͠ࣄޙͷۭؒΛޮతʹ୳ࡧɽ
ɹʬ͕େ͖͘ͳΔʹͭΕͯ㲊 ϥϯδϡόϯಈྗֶ๏ʹΑΔਅͷࣄޙ͔ΒۙࣅతͳαϯϓϧΛಘΒΕΔɽ ΔW = αt 2 { N M ∑ n∈S ∇W log p(yn |xn , W) + ∇W log p(W) } + αt p, p ∼ (0, I) . t t
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
֬తมਪ๏ ɹઌ΄Ͳɼ֬తޯ๏ͱ.$.$ͷΈ߹ΘͤΛհͨ͠ɽ ɹ࣍ɼมਪ๏ͱ֬తޯ߱Լ๏ΛΈ߹ΘͤΔɽ ɹɹ ֬తมਪ๏ ɹ ɹΛมύϥϝʔλͷू߹ͱͨ͠ͱ͖ɼ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͱͳΔΑ͏ͳۙࣅ
ΛٻΊΔ͜ͱ͕ඪɽ ⟹ ξ q(W; ξ) ≈ p(W|X, Y) q(W; ξ)
֬తมਪ๏ ɹޮԽͷͨΊʹϛχόονΛಋೖ͢Δɼ ɹ ɹϛχόονͰܭࢉ͞Εͨ ʹର͢ΔෆภਪఆྔͱͳΔɽ
ɹ͕ͨͬͯ͠ɼ Λ࠷େԽ͢ΔΘΓʹɼ Λ࠷େԽ͢Δ͜ͱʹΑͬͯɼޮ Α͘ύϥϝʔλͷࣄޙΛۙࣅͰ͖Δɽ ℒ(ξ) = N ∑ n=1 ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] ℒS (ξ) = N M ∑ n∈S ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] ℒs ℒ S [ℒs (ξ)] = ℒ(ξ) ℒ(ξ) ℒs (ξ) ϛχόονԽ
֬తมਪ๏ ɹ͜ͷޙͷεϥΠυͰɼۙࣅΛ࣍ͷΑ͏ͳಠཱͳΨεͱԾఆ͠ɼ&-#0Λ ޯ߱Լ๏Λར༻ͯ͠࠷େԽ͢Δ͜ͱΛߟ͑Δɽ q(W; ξ) = ∏ i,j,l (w(l)
i,j |μ(l) i,j , σ(l) i,j 2 )
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ޯͷϞϯςΧϧϩۙࣅ ɹχϡʔϥϧωοτϫʔΫͷ&-#0࠷େԽͰɼ&-#0ʹ͓͚Δύϥϝʔλ ղੳతʹ ੵআڈͰ͖ͳ͍ɽ ɹ ޯ߱Լ๏ʹΑͬͯ Λ࠷େԽɽ ɹޯ߱Լ๏Λ͏ͨΊʹ ΛมύϥϝʔλʹΑΔޯܭࢉΛ͢Δඞཁ͕͋Δɽ
ɼͲͪΒΨεͳͷͰղੳతʹޯܭࢉͰ͖ΔɽҰํͰɼର ղੳతʹੵͰ͖ͳ͍ɽ W ⟹ ℒS (ξ) ℒS (ξ) ξ DKL [q(W; ξ)||p(W)] ∫ q(W; ξ)log p(yn | f(xn ; W))dW
ޯͷϞϯςΧϧϩۙࣅ ɹχϡʔϥϧωοτϫʔΫͷ&-#0࠷େԽͰɼ&-#0ʹ͓͚Δύϥϝʔλ ղੳతʹ ੵআڈͰ͖ͳ͍ɽ ɹ ޯ߱Լ๏ʹΑͬͯ Λ࠷େԽɽ ɹޯ߱Լ๏Λ͏ͨΊʹ ΛมύϥϝʔλʹΑΔޯܭࢉΛ͢Δඞཁ͕͋Δɽ
ɼͲͪΒΨεͳͷͰղੳతʹޯܭࢉͰ͖ΔɽҰํͰɼର ղੳతʹੵͰ͖ͳ͍ɽ W ⟹ ℒS (ξ) ℒS (ξ) ξ DKL [q(W; ξ)||p(W)] ∫ q(W; ξ)log p(yn | f(xn ; W))dW ɹϞϯςΧϧϩ๏ͰੵʢରʣΛۙࣅͯ͠ɼޯͷਪఆΛಘΑ͏ʂ
ޯͷϞϯςΧϧϩۙࣅ ʲඪʳ ɹύϥϝʔλ ʹରͯ͠ɼ͋Δ ͱ Λߟ͑ɼ࣍ͷޯΛਪ͢ Δ͜ͱɽ ʲܭࢉํ๏ʳ
ɹείΞؔਪఆɼ࠶ύϥϝʔλԽޯɼҰൠԽ࠶ύϥϝʔλԽޯɼӄؔඍͳͲ w ∈ ℝ f(w) q(w; ξ) I(ξ) = ∇ξ ∫ f(w)q(w; ξ)dw
ޯͷϞϯςΧϧϩۙࣅ είΞؔਪఆ ɹҎԼͷΑ͏ʹ Λมܗ͢Δɽ ɹ͕ͨͬͯ͠ɼ ͔Β ΛෳαϯϓϦϯά͔ͯ͠ΒඍΛධՁ͢Δ͜ͱͰ ͷෆ
ภਪఆྔ͕ಘΒΕΔɽ ʲద༻Ͱ͖Δ݅ʳɹ ͷඍ͕ܭࢉՄೳɽ ʲʳɹ࣮༻্ඇৗʹߴ͍ࢄ͕ൃੜͯ͠͠·͏ɽ ʲղܾࡦʳɹ੍ޚมྔ๏ͳͲͷࢄݮগख๏ͱΈ߹ΘͤΔɽ I(ξ) I(ξ) = ∇ξ ∫ f(w)q(w; ξ)dw = ∫ f(w)∇ξ q(w; ξ)dw = ∫ f(w)q(w; ξ)∇ξ log q(w; ξ)dw = q(w;ξ) [ f(w)∇ξ log q(w; ξ)] q(w; ξ) w I(ξ) log q(w; ξ)
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯ ɹ Λ ͔ΒαϯϓϦϯά͢ΔΘΓʹɼʹґଘ͠ͳ͍ ͔ΒΛαϯϓϦϯ ά͠ɼม Λద༻͢Δ͜ͱͰؒతʹ ͷαϯϓϦϯάΛ͢Δ͜ͱΛߟ͑Δɽ ɹ͕ͨͬͯ͠ɼҎԼͷΑ͏ʹޯͷෆภਪఆྔ͕ಘΒΕΔɽ
ʲ۩ମྫʳɹ ɼ ͷ߹ ɹ ɼ ͱ͢Δ͜ͱͰɼ ͔ΒαϯϓϦϯ άͰ͖Δɽมύϥϝʔλʹؔ͢Δޯͷඍɼ࣍ͷΑ͏ʹͳΓɼ֤มύϥϝʔλ ͷޯͷෆภਪఆྔ͕ಘΒΕΔɽ ɹɹɹɹ ɹɹɹɹ w q(w; ξ) ξ q(ϵ) ϵ w = g(ξ, ϵ) w q(ϵ) [ f′(g(ξ; ϵ))∇ξ g(ξ; ϵ)] = I(ξ) ξ = { ̂ μ, ̂ σ2} q(w; ξ) = (w| ̂ μ, ̂ σ2) ˜ ϵ ∼ (0,1) = q(ϵ) ˜ w = g(ξ; ϵ) = ̂ μ + ̂ σϵ ˜ w ( ̂ μ, ̂ σ2) ∂ ∂ ̂ μ ∫ f(w)q(w; ξ)dw = ∫ f′(w)q(w; ξ)dw ∴ I( ̂ μ) = q(w;ξ) [ f′(w)] ∂ ∂ ̂ σ ∫ f(w)q(w; ξ)dw = ∫ f′(w) (w − ̂ μ) ̂ σ q(w; ξ)dw ∴ I( ̂ μ) = q(w;ξ) [f′(w) (w − ̂ μ) ̂ σ ]
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯͷҰൠԽ ʲ࠶ύϥϝʔλԽޯͷརʳ ɹɹείΞؔਪఆͱൺͯޯͷࢄΛখ͑͘͞ΒΕΔɽ ʲ࠶ύϥϝʔλԽޯͷʳ ɹɹมม ͕ඞཁɽʢશͯͷͰద༻Ͱ͖ΔΘ͚Ͱͳ͍ɽʣ ʲղܾࡦɹྫɿʳɹҰൠԽ࠶ύϥϝʔλԽޯ ɹɹ ʹؔ͢Δ੍Λ؇Ίɼଟ͘ͷछྨͷʹରͯ͠ద༻Մೳͱͨ͠ͷɽ
ɹɹ ͷΑ͏ʹมύϥϝʔλͷґଘੑΛ͢͜ͱΛڐ͢ɽ ʲղܾࡦɹྫɿʳɹӄؔඍ ɹʲ͑Δ݅ʳ w ΛٻΊΔ͜ͱࠔ͕ͩɼٯม ༰қʹಘΒΕΔɽ w ࿈ଓͷ ɹɹ ΛͰඍ͢Δ͜ͱͰظͷޯΛಘΔɽ g g q(ϵ; ξ) g g−1 ϵ = g−1(ϵ; ξ) ξ
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯͷҰൠԽ ʲղܾࡦɹྫɿʳɹ࿈ଓ؇ ɹɹࢄͷ֬ʹରͯ͠࠶ύϥϝʔλԽޯΛద༻͢Δํ๏ɽ ɹʲ۩ମྫʳ ΧςΰϦʢࢄʣɼΨϯϕϧιϑτϚοΫεʢ࿈ଓʣͷԹύ ϥϝʔλΛʹઃఆͨ͠ͷͱҰக͢Δɽ ɹɹ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ޯۙࣅʹΑΔมਪ๏ ɹ࣮ࡍʹ࠶ύϥϝʔλԽޯΛར༻ͯ͠ϕΠζχϡʔϥϧωοτͷ&-#0Λ࠷େԽ͢Δɽ ᶃ ϛχόον Λσʔληοτ ͔ΒϥϯμϜʹநग़͢Δɽ ᶄ .ݸʢϛχόονͷαϯϓϧʣͷϊΠζΛऔಘ͢Δɽ ɹ
ᶅ มύϥϝʔλʹؔ͢ΔޯΛܭࢉ͢Δɽ ᶆ &-#0ͷ૿ՃํʹมύϥϝʔλΛߋ৽͢Δɽ s ˜ ϵi ∼ (0, I) ℒs (ξ) = N M ∑ n∈S ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] = N M ∑ n∈S ∫ p(ϵ)log p(yn | f(xn ; g(ξ; ϵ)))dϵ − DKL [q(W; ξ)||p(W)] ≈ ℒS,ϵ (ξ) ( ∵ ,ϵ [ℒS,ϵ (ξ)] = ℒ(ξ)) = N M ∑ n∈S log p(yn | f(xn ; g(ξ; ˜ ϵn ))) − DKL [q(W; ξ)||p(W)], ∇ξ ℒs (ξ) ≈ ∇ξ ℒS,ϵ (ξ) = N M ∑ n∈S ∇ξ log p(yn | f(xn ; g(ξ; ˜ ϵn ))) − ∇ξ DKL [q(W; ξ)||p(W)] . ξ ← ξ + α∇ξ ℒS,ϵ (ξ)
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ظ๏ʹΑΔֶश ɹॱܭࢉͰχϡʔϥϧωοτϫʔΫΛ௨ͨ֬͠ͷʹΑΓपลͷධՁΛ ߦ͍ɼٯͰύϥϝʔλΛֶश͢ΔͨΊʹظ๏Λ༻͍ͯपลͷޯΛ ܭࢉ͢Δɽ ֬తٯ๏ ɹ֬తٯ๏σʔλΛஞ࣍తʹॲཧͰ͖ΔͷͰɼେྔσʔλΛ༻ֶ͍ͨशͰε έʔϧՄೳɽ؍ଌσʔλͷਫ਼ύϥϝʔλॏΈͷࣄલΛࢧ͢Δਫ਼ύϥϝʔλ ۙࣅਪՄೳɽ ⟹
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश Ϟσϧ ʲઃఆʳ ɹɹ ͱ͠ɼपลΛҎԼͷΑ͏ʹఆٛ͢Δɽ ɹ
ͷ׆ੑԽؔʹਖ਼نԽઢܗؔʢ3F-6ʣΛ༻͍Δɽ ɹɹύϥϝʔλ ɼಠཱͳΨεʹै͏ͱ͢Δɽ ʲඪʳ ɹɹҎԼͷࣄޙΛۙࣅਪ͢Δ͜ͱɽ yn ∈ ℝ p(Y|X, W, γr ) = N ∏ n=1 (yn | f(xn ; W), γ−1 y ) p(γy ) = Gam(γr |αγy 0 , βγy 0 ) f(xn ; W) W p(W|γw ) = L ∏ l=1 Hl ∏ i=1 Hl−1 ∏ j=1 (w(l) i,j |0,γ−1 w ) p(γw ) = Gam(γw |αγw 0 , βγw 0 ) p(W, γy , γw |) ∝ p(Y|X, W, γr )p(W|γw )p(γy )p(γw )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ۙࣅ ɹ֬తٯ๏ɼԾఆີϑΟϧλϦϯάʹج͍͍ͮͯΔɽ ɹύϥϝʔλͷۙࣅΛ࣍ͷΑ͏ʹ͓͘ɽ ɹ ɹ্ͷࣜΛԾఆີϑΟϧλϦϯάʹ͓͚ΔϞʔϝϯτϚονϯάͰஞ࣍తʹߋ৽ͯ͠ ͍͘ɽ q(W,
γy , γw ) = Gam(γy |αγy , βγy )Gam(γw |αγw , βγw ) L ∏ l=1 Hl ∏ i=1 Hl−1 ∏ j=1 (w(l) i,j |m(l) i,j , v(l) i,j ) = q(γy )q(γw )q(W) ԾఆີϑΟϧλϦϯά qi+1 (θ) ≈ ri+1 = 1 Zi+1 fi+1 (θ)qi (θ) ɿҼࢠ fi (θ)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲॳظԽʳ ɹɹۙࣅ͕ແใʹͳΔΑ͏ʹɼ ɼ ɼ ɼ ɼ ɼ
ͰॳظԽ͢Δɽ ʲࣄલҼࢠͷಋೖʳ ɹඪͷࣄޙͷҼࢠΛͭͭՃ͢Δ͜ͱͰۙࣅΛߋ৽͢Δɽ ɹࠓճͷϞσϧʹ͓͚ΔࣄલҼࢠҎԼͷΑ͏ʹͳΔɽ ɹ m(l) i,j = 0 v(l) i,j = ∞ αγy = 1 βγy = 0 αγw = 1 βγw = 0 p(γr ), p(γw ), {p(w(l) i,j |γw )}i,j,l ࣄޙɿɹ ۙࣅɿɹ p(W, γy , γw |) ∝ p(Y|X, W, γr )p(W|γy )p(γw )p(γw ) q(W, γy , γw ) = q(γy )q(γw )q(W)
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͓Αͼ ͷՃɽ ɹۙࣅ Λࣄલ ͱಉ͡ͷʹ͍ͯ͠ΔͷͰɼҼࢠͷߋ৽ ҎԼͷΑ͏ʹͳΔɽ
ɹɹɹɹɹɹɹɹ ɼ ɼ ɼ ͭ·Γɼ ɼ p(γw ) p(γy ) q(γy ), q(γw ) p(γy ), p(γw ) qnew(γy )qnew(γw )qnew(W) ≈ p(γy )p(γw )q(W) αnew γy = αγy 0 βnew γy = βγy 0 αnew γw = αγw 0 βnew γw = βγw 0 q(γr ) ← p(γr ) q(γw ) ← p(γw ) ԾఆີϑΟϧλϦϯά qnew(γy )qnew(γw )qnew(W) ≈ r = 1 Z f new(γy , γw , W)q(γy )q(γw )q(W)
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃ ɹҎ߱ͰɼΠϯσοΫε Λলུ͢Δɽ ɹߋ৽͞ΕΔͷɼ
͓Αͼ Ͱ͋ΔɽΑͬͯɼͦΕͧΕΛҎԼͷΑ͏ʹߋ৽ ͢Δɽ ɹԼઢ෦ΛҼࢠͱΈͳ͢ɽҙ͖͢ɼͭͷͷߋ৽ʹͭͷ৽ͨʹߋ৽͞ Εͨ༻͍ͯ͠ͳ͍ͳͷͰɼߋ৽ॱʹؔͳ͍͜ͱɽ p(w(l) i,j |γw ) qnew(γy )qnew(γw )qnew(W) ≈ 1 Z p(w(l) i,j |γw )q(γy )q(γw )q(W) ⇔ qnew(γw )qnew(W) ≈ 1 Z p(w(l) i,j |γw )q(γw )q(W) i, j, l q(W) q(γw ) qnew(W) ≈ 1 Z0 p(w|γw )q(γw )q(W) qnew(γw ) ≈ 1 Z0 p(w|γw )q(W)q(γw )
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃɿ ͷߋ৽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(w(l) i,j |γw
) q(W) qnew(W) ≈ 1 Z0 p(w|γw )q(γw )q(W) ɹ ΨεͰ͋Δ͜ͱ͔ΒɼͷΨεͷྫʢQʣͱಉ༷ʹ ϞʔϝϯτϚονϯάʹΑͬͯɼҎԼͷΑ͏ʹۙࣅ͕ߋ৽͞ΕΔɽ q(W) mnew = m + v ∂ ∂m log Z0 vnew = v − v2 {( ∂ ∂m log Z0) 2 − 2 ∂ ∂v log Z0} Z0 = Z(αγw , βγw ) = ∫ p(w|γw )q(W)q(γw )dwdγw = ∫ (w|0,γ−1 w )(w|m, v)Gam(γw |αγw , βγw )dwdγw
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃɿ ͷߋ৽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(w(l) i,j |γw
) q(γw ) qnew(γw ) ≈ 1 Z0 p(w|γw )q(W)q(γw ) ɹ ΨϯϚͰ͋Δ͜ͱ͔ΒɼͷΨϯϚͷྫʢQʣͱಉ༷ʹ ϞʔϝϯτϚονϯάʹΑͬͯɼҎԼͷΑ͏ʹۙࣅ͕ߋ৽͞ΕΔɽ ɹɹɹɹɹɹɹɹ ͨͩ͠ɼ ɼ q(γw ) αnew γw = { Z0 Z2 Z−2 1 αγw + 1 αγw − 1 } −1 βnew γw = { Z2 Z−1 1 αγw + 1 βγw − Z1 Z−1 0 αγw βγw } −1 Z1 = Z(αγw + 1,βγw ) Z2 = Z(αγw + 2,βγw )
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ ɹਖ਼نԽఆ ݫີʹٻΊΒΕͳ͍ͷͰɼܭࢉ్தͰݱΕΔενϡʔσϯτ ͷUΛɼฏۉͱࢄͷ͍͠ΨεͰۙࣅ͢Δɽ Z(αγw , βγw
) Z(αγw , βγw ) = ∫ (w|0,γ−1 w )q(W, γy , γw )dWdγy dγw = ∫ (w|0,γ−1 w )(w|m, v)Gam(γw |αγw , βγw )dwdγw = ∫ St(w|0,αγw /βγw ,2αγw )(w|m, v)dw ≈ ∫ (w|0,(αγw − 1)/βγw )(w|m, v)dw = (w|0,(αγw − 1)/βγw + v) UΛฏۉͱࢄ͕ ͍͠Ψεʹ ۙࣅɽ
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹࣄલͷ֤Ҽࢠ͕Ճ͞Εͨޙɼ ͷҼࢠΛͭͣͭՃ͢Δɽ ɹ Ψεɼ ΨϯϚͳͷͰɼઌ΄Ͳͷߋ৽ͱಉ༷ʹߦ͏ɽ
৽͘͠ೖ͖ͬͯͨͷҼࢠ ʹର͢Δਖ਼نԽఆʢ ͷ Ճ࣌ͱҟͳΔߋ৽෦ʣΛܭࢉ͢Δ͜ͱ͕ඪɽ ɹ p(Y|X, W, γy ) qnew(γy )qnew(γw )qnew(W) ≈ 1 Z p(yi |xi , W, γy )q(γy )q(γw )q(W) ⇔ qnew(γr )qnew(W) ≈ 1 Z p(yi |xi , W, γy )q(γr )q(W) q(W) q(γy ) qnew(W) ≈ 1 Z0 p(yi |xi , W, γy )q(γw )q(W) qnew(γw ) ≈ 1 Z0 p(yi |xi , W, γy )q(W)q(γw ) ⟹ p(yi |xi , W, γy ) p(w(l) i,j |γw )
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ൪ͷΛՃͨ͠ͱ͖ͷਖ਼نԽఆΛɼ࣍ͷΑ͏ʹۙࣅతʹٻΊΔɽ ɹ i Z(αγy , βγy
) = ∫ (yi | f(xi , W), γy )q(W, γy , γw )dWdγy dγw = ∫ (yi | f(xi , W), γy )q(W, γy )dWdγy ≈ ∫ (yi |z(L), γy )(z(L) |mz(L) , vz(L) )Gam(γy |αγy , βγy )dz(L)dγy = ∫ St(yi |z(L), αγy /βγy ,2αγy )(z(L) |mz(L) , vz(L) )dz(L) ≈ ∫ (yi |mz(L) , (αγy − 1)/βγy )(z(L) |mz(L) , vz(L) )dw = (yi |mz(L) , (αγy − 1)/βγy + vz(L) ) UΛฏۉͱࢄ͕ ͍͠Ψεʹ ۙࣅɽ ͷӅΕϢχοτ ͕ฏۉ ɼ ࢄ ʹै͏ͱԾఆɽ ʢ࣍ͷεϥΠυͰৄ͘͠ʣ l z(l) ∈ ℝHl mz(l) vz(l)
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙ ͷӅΕϢχοτͷฏۉ ͱ ࢄ ͔Βͷ׆ੑͷฏۉ ͱࢄ ͕ٻ·Δɽ l − 1 mz(l−1) vz(l−1) l ma(l) va(l)
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙ ͷӅΕϢχοτͷฏۉ ͱ ࢄ ͔Βͷ׆ੑͷฏۉ ͱࢄ ͕ٻ·Δɽ l − 1 mz(l−1) vz(l−1) l ma(l) va(l) ͷ׆ੑͷฏۉ ͱࢄ ͔Β ͷӅΕϢχοτͷฏۉ ͱࢄ ͕ٻ·Ε࠶ؼతʹܭࢉՄೳɽ l ma(l) va(l) l mz(l) vz(l)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ׆ੑͷ ɹ׆ੑ ͷ Λܭࢉ͢Δɽத৺ۃݶఆཧΑΓɼӅΕϢχοτ ͕େ͖͍߹ɼ ۙࣅతʹΨεʹै͏ɽ
ɹΨεʹै͏ม͕3F-6Λ௨ΔͱɼਤͷӈਤͷΑ͏ʹͷࠞ߹ʹͳ Δɽ ᶃ ෛͷೖྗΛ௨͖ͬͯͨαϯϓϧɼฏۉ ɼࢄ ͷΑ͏ͳ࣭ʹͳ Δɽ ᶄ ඇෛͷೖྗΛ௨͖ͬͯͨαϯϓϧɼҎԼ͕ΒΕͨஅยΨεʹͳΔɽ a(l) p(a(l) |W(l), z(l−1)) Hl−1 a(l) p(a(l) |W(l), z(l−1)) ≈ q(a(l)) = (a(l) |ma(l) , va(l) ) μp = 0 σp = 0
ظ๏ʹΑΔֶश ׆ੑͷ ʲࠞ߹ͷฏۉͱࢄͷҰൠࣜʳ ɹ ݸͷཁૉΛ࣋ͭࠞ߹ͷฏۉͱࢄɼࠞ߹ ɼ ͱ͢Δͱɼ ҰൠతʹҎԼͷΑ͏ʹͳΔɽ
K πk > 0 K ∑ k=1 πk = 1 [xmix ] = K ∑ k=1 πk μk [xmix ] = K ∑ k=1 πk (μk + σk ) − [xmix ]2
ظ๏ʹΑΔֶश ׆ੑͷ ʲ׆ੑͷࠞ߹ʹద༻ʳɹ ɹɹ࣭ͱஅยΨεͷࠞ߹ΛͦΕͧΕ ɼ ͱ͢Δɽͭ·Γɼ ɽ ɹ ɼ ͱ͓͘ͱɼҎԼͷΑ͏ʹͳΔɽ
ɹ͕ͨͬͯ͠ɼஅΨεͷҎԼͷΑ͏ʹٻΊΒΕΔɽ ɹ<4,PU[ >ΑΓɼஅยΨεͷฏۉ ͱࢄ ҎԼͷΑ͏ʹͳΔɽ ɹҰൠࣜʹ͓͚Δ ɼ ʹͯΊΔͱɼͷฏۉͱࢄ͕ಘΒΕΔɽ πp πt πp + πp = 1 πp ¯ μ = − μ/σ πp = ∫ 0 −∞ (x|μ, σ2)dx = Φ(−μ/σ) = Φ( ¯ μ) πt = 1 − πp = Φ(− ¯ μ) μt σt μt = μ + σ ( ¯ μ|0,1) Φ(− ¯ μ) σ2 t = σ2 {1 + ¯ μ ( ¯ μ|0,1) Φ(− ¯ μ) − ( ¯ μ|0,1) Φ(− ¯ μ) − 2} ( ¯ μ|0,1) Φ(− ¯ μ) [xmix ] [xmix ] z
ظ๏ʹΑΔֶश ׆ੑͷ ͭ·Γɼ ͷ׆ੑͷฏۉͱࢄ͔ΒͷӅΕϢχοτͷฏۉͱࢄ͕ܭࢉՄೳɽ l l ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ
(z(L) |mz(L) , vz(L) ) mz(L) vz(L)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ޯʹجֶͮ͘श ɹ ɼฏۉ ɼࢄ ͱͯ͠ѻ͏ʢ࠶ؼܭࢉͷॳظ ɼ ʣɽ dͰɼ ͷग़ྗ
͔Β׆ੑ Λ௨͠ɼͷग़ྗ ͷฏۉͱࢄΛٻΊΔʢத৺ۃݶఆཧΑΓΨεʹۙࣅͰ͖ΔɽʣҰ࿈ͷྲྀΕΛ հͨ͠ɽ͜ͷۙࣅ݁ՌΛ࠶ؼతʹ༻͍Δ͜ͱͰɼ࠷ऴ ͷΛΨε Ͱۙࣅ͢Δ͜ͱ͕Ͱ͖Δɽ ɹ͕ͨͬͯ͠ɼਖ਼نԽఆͷۙࣅදݱ͕ಘΒΕΔɽ ɹਖ਼نԽఆΛಘͨޙɼύϥϝʔλʹΑΔඍΛܭࢉ͢Δ͜ͱͰޯ͕ܭࢉͰ͖Δɽ z(0) xi 0 mz(0) vz(0) l − 1 z(l−1) a(l) l z(l) z(L) (z(L) |mz(L) , v(L) z ) Z(αγy , βγy ) ≈ (yi |mz(L) , (αγy − 1)/βγy + vz(L) )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ֬తٯ๏ͷ·ͱΊ Ϟσϧͷఆٛɿ p(W, γy , γw |) ∝ p(Y|X,
W, γr )p(W|γw )p(γy )p(γw ) ۙࣅͷಋೖɿ q(W, γy , γw ) = q(γy )q(γw )q(W) ۙࣅͷॳظԽɿ q0 (γy ), q0 (γw ), q0 (W) ࣄલҼࢠͷಋೖʢͦͷʣɿ Ҽࢠ ͷՃɿ Ҽࢠ ͷՃɿ p(γr ) q(γr ) ← p(γr ) p(γw ) q(γw ) ← p(γw )
ظ๏ʹΑΔֶश ֬తٯ๏ͷ·ͱΊ ࣄલҼࢠͷಋೖʢͦͷʣɿ for l = 1 to L do
for j = 1 to Hl−1 do for i = 1 to Hl do Ҽࢠp(w(l) i,j |γw )ͷՃɿ ⋅ q(W)ͷߋ৽ ⋅ q(γw )ͷߋ৽ ॱɿ p(yi |xi , W, γy ) where i ∈ s ӅΕϢχοτͱ׆ੑͷฏۉͱࢄΛ࠶ؼܭࢉ Ҽࢠ ͷಋೖɿ ͷߋ৽ p(yi |xi , W, γy ) q(W), q(γy )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ؔ࿈ख๏ ɹ֬తٯ๏ʹࣅͨख๏ͱͯ͠ɼܾఆతมਪ๏͕͋Δɽ ʲมਪ๏ͷܽʳ ɹ&-#0ͷධՁͷͨΊʹରͷظΛܭࢉ͢Δඞཁ͕͋ΓɼϞϯςΧϧϩ๏Ͱۙ ࣅղΛಘ͍ͯΔɽ ҆ఆੑ͕͍ ʲܾఆతมਪ๏ʳ ɹظͷۙࣅܭࢉΛܾఆతʹߦ͏͜ͱͰ҆ఆੑΛߴΊΒΕΔɽ ⟹