October 19, 2020

Transcript

1. ࿦จ঺հɿStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

ి௨େঙ໺ݚ M2 ઒ౡوେ October 19, 2020
2. ໨࣍ 1. ࿦จʹ͍ͭͯ 2. ϕΠζਪ࿦ͱม෼ਪ࿦ 3. Stein’s Identity ͔Β Kernelized

Stein Discrepancy ΁ 4. Stein Variational Gradient Descent 5. ࣮ݧ 2

4. ࿦จ৘ใ Title: Stein Variational Gradient Descent: A General Purpose Bayesian

Inference Algorithm Authors: Qinag Liu, Dilin Wangʢൃද࣌ॴଐɿμʔτϚεେֶɼ ݱࡏ͸ͦΕͧΕ UT Austin, Facebookʣ Conference: NeurIPS 2016 3
5. ࿦จ֓ཁ Χʔωϧ๏ & ཻࢠϕʔεͷม෼ਪ࿦๏ Stein Variational Gradient Descent (SVGD) ΛఏҊ

X (Kernelized) Stein Discrepancy ʹج͍ͮͯਅͷ෼෍ͱۙࣅ෼ ෍ͷ KL μΠόʔδΣϯεΛ࠶ؼతʹ࠷খԽ X ݁ہ͸͍͔ʹ KL μΠόʔδΣϯε࠷খԽΛղ͔͘ʹؼண X Χʔωϧ๏Λ͏·͘࢖ͬͯ KL μΠόʔδΣϯεͷʢ൚ؔ਺ʣ ޯ഑߱Լͷߋ৽ࣜΛಋग़ X ฏۉ৔ۙࣅ͕ෆཁͳͷͰ௨ৗͷม෼ਪ࿦ͷΑ͏ͳࣄޙ෼෍ʹ ಠཱੑΛԾఆ͠ͳ͍ X ཻࢠ਺ 1 Ͱ MAP ղ͕ಘΒΕΔ X ݪཧతʹଟๆੑʹରͯ͠ϩόετ 1 1࿦จதʹ໌ه͞Ε͍ͯΔΘ͚Ͱ͸ͳ͍ 4

7. ϕΠζਪ࿦͓͞Β͍ɿϕΠζͷఆཧ ঺հ࿦จͷલఏ஌ࣝͳͷͰ͓͞Β͍ʢ࿦จͷൣғ֎ʣ ϕΠζͷఆཧ D = {Dk}ɿσʔλू߹ɼx ∈ X ⊂ RdɿύϥϝʔλϕΫτϧ

ࣄલ෼෍ p0(x) ͷ΋ͱࣄޙ෼෍ p(x) ͸ҎԼͷΑ͏ʹॻ͚Δɿ p(x) ∶= p(x∣D) = ¯ p(x)/Z, ¯ p(x) = p0(x)p(D∣x), Z = p(D) = ∫ ¯ p(x)dx ͳΜͱ͔ࣄޙ෼෍ p(x) or ¯ p(x) ΛٻΊΔͷ͕ϕΠζਪ࿦ͷ໨ඪ ▷ Ұൠʹѻ͍ͮΒ͍ࣄޙ෼෍ΛͲ͏ۙࣅ͢Δ͔ʁ 5
8. ϕΠζਪ࿦͓͞Β͍ɿม෼ਪ࿦ ม෼ਪ࿦ p(x)ɿະ஌ͷ෼෍ɼq(x)ɿൺֱతѻ͍΍͍ؔ͢਺ܗͷ෼෍ ֬཰෼෍ಉ࢜ͷྨࣅ౓ΛଌΔద౰ͳࢦඪΛ࠷దԽ͠ɼq Λ p ʹ͚ۙͮΔ ΋ͬͱ΋Α͋͘Δม෼ਪ࿦ͷ࿮૊Έɿ 1. ύϥϝʔλϕΫτϧ

x ͷཁૉͷࣄޙ෼෍ʹద౰ʹಠཱੑΛԾ ఆʢฏۉ৔ۙࣅʣ (e.g. x1,x2 x3 ∣ D, x = [x1,x2,x3]⊺) 2. ର਺पล໬౓ ZʢͷԼքʣΛ࠷େԽ͢ΔΑ͏ʹ q Λߋ৽ X ਅͷσʔλ෼෍ʹؔ͢Δ Z ͷظ଴஋͕େ͖͍΄Ͳɼਅͷσʔ λ෼෍ͱϞσϧͷ KL μΠόʔδΣϯε͸খ͘͞ͳΔ 2 2จݙ  ͳͲࢀরɽఆ͔ٛΒ͙͢ʹࣔͤΔɽ 6
9. ϕΠζਪ࿦͓͞Β͍ɿม෼ਪ࿦ͷಛ௃ ௨ৗͷม෼ਪ࿦ͷ࿮૊Έͷಛ௃ɿ X MCMC ΑΓऩଋ͕͸΍͍ X ʢ࠷దԽ๏ʹΑΔ͕ʣہॴղʹτϥοϓ͞ΕΔ X ଟๆੑͷ p(x)

Λ୯ๆੑͷ q(x) Ͱۙࣅͯ͠΋ͻͱͭͷϞʔυ ͔͠ݟΒΕͳ͍ X ΑΓޮ཰తͰۙࣅੑೳͷྑ͍ม෼ਪ࿦Λߦ͏ʹ͸ɼઃఆͨ͠ Ϟσϧʹର͢Δਂ͍ཧղ͕ඞཁ X ࣄޙ෼෍ͷͲͷ֬཰ม਺ؒʹಠཱੑΛԾఆ͢Δ͔ʁ ▷ ϞσϧʹґଘͤͣҰൠʹ࢖͑Δม෼ਪ࿦๏Λߟ͍͑ͨ 7

11. Stein’s Identity Ծఆͱه߸ X σʔλू߹ D = {Dk} ͕ i.i.d.

ʢಠཱಉ෼෍ʣ X ࣄޙ෼෍ p(x) ͸ X ্ͷ࿈ଓ͔ͭඍ෼Մೳͳؔ਺ X ϕ(x) = [ϕ1(x),...,ϕd(x)]⊺ ∈ Rdɿ͋Δਖ਼ଇ৚݅Λຬͨ ͢ϕΫτϧ஋ؔ਺ʢ࣍ϖʔδࢀরʣ Stein’s Identity (ελΠϯͷ౳ࣜʣ Stein Operator Ap ΛҎԼͰఆٛʢจݙ  ͳͲʹΑΔఆٛʣ ɿ Apϕ(x) = ∇x log p(x) ⋅ ϕ(x) + ∇x ⋅ ϕ(x) ∈ R. ͜ͷͱ͖ҎԼͷ Stein’s Identity ͕੒Γཱͭɿ Ex∼p[Apϕ(x)] = 0. 8
12. ิ଍ɿStein’s Identity ͷਖ਼ଇੑ৚݅ ϕ(x) ͷਖ਼ଇੑ৚݅ ϕ ͕ҎԼͷ͍ͣΕ͔ͷڥք৚݅Λຬͨ͢ͳΒ Stein’s Identity ͕੒ཱʢ∂X

͸ X ͷڥքʣ 1. p(x)ϕ(x) = 0, ∀x ∈ ∂X (X is compact) 2. limr→∞ ∮Br p(x)ϕ(x) ⋅ n(x)dS = 0 (X = Rd) ͜͜Ͱ Br ͸൒ܘ r ͷ௒ٿɼn(x) ͸ Br ͷ୯Ґ๏ઢϕ Ϋτϧ Ψ΢εͷൃࢄఆཧ ∫ X ∇x ⋅ (p(x)ϕ(x))dx = ∫ ∂X p(x)ϕ(x) ⋅ n(x)dx = 0. ͜ΕΛ༻͍ͯਖ਼ଇੑ৚݅ ⇒ Steins’ Identity Λࣔ͢ 9
13. ิ଍ɿStein’s Identity ͷূ໌ ূ໌ Stein Operator Ap ͷఆٛΑΓ Apϕ(x) =

∇x log p(x) ⋅ ϕ(x) + ∇x ⋅ ϕ(x) = ∇xp(x) ⋅ ϕ(x) + p(x)∇x ⋅ ϕ(x) p(x) = ∇x ⋅ (p(x)ϕ(x)) p(x) . ͕ͨͬͯ͠ Ex∼p[Apϕ(x)] = ∫ X ∇x ⋅ (p(x)ϕ(x)) p(x) p(x)dx = ∫ X ∇x ⋅ (p(x)ϕ(x))dx = 0. ◻ 10
14. Stein Discrepancy Stein’s Identity ΛԠ༻ͯ͠ 2 ͭͷ෼෍ͷ “ဃ཭౓” Λߟ͍͑ͨ ·ͣ͸ҎԼͷ

Stein Discrepancy Λఆٛɿ Stein Discrepancy p(x),q(x)ɿX ্ͷͳΊΒ͔ͳ֬཰෼෍ Fɿద౰ͳؔ਺ू߹ ͜ͷͱ͖ Stein Discrepancy Λmaxϕ∈F Ex∼q[Apϕ(x)]ͱఆٛ Stein Discrepancy ͷղऍɿ ϕ ͰॏΈ෇͚ΒΕͨ p ͱ q ͷείΞؔ ਺ͷࠩͷظ଴஋ 3 ∵ Ex∼q[Aqϕ(x)] = 0 ΑΓ Ex∼q[Apϕ(x)] = Ex∼q[Apϕ(x) − Aqϕ(x)] = Ex∼q[(∇x log p(x) − ∇x log q(x)) ⋅ ϕ(x)]. 3จݙ  Lemma 2.3 ࢀর 11
15. Kernelized Stein Discrepancy طଘݚڀͰ͸ɼؔ਺ू߹ F ʹ͸ϦϓγοπϊϧϜͷ্քͳͲͷ ੍໿͕༻͍ΒΕ͖ͯͨ 4 F Λ࠶ੜ֩ώϧϕϧτۭؒʹݶఆͯ͠

Stein Discrepancy Λ ѻ͍΍ͨ͘͢͠ͷ͕ Kernelized Stein Discrepancy (KSD) Kernelized Stein Discrepancy (KSD) [4, 5] Hɿਖ਼ఆ஋Χʔωϧ k(x,x′) ͕ఆΊΔ࠶ੜ֩ώϧϕϧτۭؒ Hdɿf = [f1,...,fd]⊺, fi ∈ H ʹରԠ͢Δؔ਺ۭؒ ͜ͷͱ͖ Kernelized Stein Discrepancy D(q,p)ΛҎԼͰఆٛ ɿ D(q,p) = max ϕ∈Hd {Ex∼q[Apϕ(x)] s.t. ∥ϕ∥Hd ≤ 1} 4จݙ  ͳͲ 12
16. ิ଍ɿΧʔωϧؔ਺͕ੜ੒͢Δώϧϕϧτۭؒ ਖ਼ఆ஋Χʔωϧ k(x,x′) ͔Βఆ·Δઢܗۭؒ H0 ΛҎԼͰఆٛɿ H0 = {f(x) =

m ∑ i=1 αik(x,xi) ∣ αi ∈ R, m ∈ N, xi ∈ X}. H0 ͷཁૉ f = ∑i αik(⋅,xi), g = ∑j βjk(⋅,xj) ʹର͠ɼ ⟨f,g⟩H0 = ∑i,j αiβjk(xi,xj)͸಺ੵͷఆٛΛຬͨ͢ ಺ੵۭؒ H0 ͷίʔγʔྻ͔ΒͳΔू߹ ˜ H0 ͱॻ͘ɿ ˜ H0 = {{fn} ⊂ H ∣ ∥fn − fm∥ → 0 (n,m → ∞)}. ˜ H0 Λಉ஋ؔ܎ {fn} ∼ {gn} ⇔ ∥fn − gn∥ → 0 (n → ∞) Ͱׂͬͯಘ ۭͨؒ͸͋Δώϧϕϧτۭؒ 5H ͱಉҰࢹͰ͖Δ 6 5H ্ͷ͢΂ͯͷίʔγʔྻ͕ H ಺ʹۃݶΛ΋ͭ 6ৄ͍ٞ͠࿦͸ [6, 7] ͳͲࢀর 13
17. ิ଍ɿώϧϕϧτۭؒͷ࠶ੜੑ H0 ͷ಺ੵʹؔͯ͠ ⟨f,k(x,⋅)⟩H0 = ∑ i αik(xi,x) = f(x)

͕Θ͔Γɼ͜ͷੑ࣭Λ H0 ͷ࠶ੜੑͱݺͿ H0 ͷ࠶ੜੑΛ࢖͏ͱɼਖ਼ఆ஋Χʔωϧ k(⋅,⋅) ͔Βఆ·Δώϧϕϧ τۭؒ H ͷ࠶ੜੑ΋ࣔ͞ΕΔ ▷ H ͸࠶ੜ֩ώϧϕϧτۭؒ (RKHS) ͱݺ͹ΕΔ ࠶ੜੑ͸Χʔωϧ๏ͷཧ࿦͚ͮʹ͓͍ͯඇৗʹॏཁ 7 7ޙʹݟΔΑ͏ʹ KSD ͷ࠷దԽʹ΋࢖͏ 14
18. Kernelized Stein Discrepancy ͷ࠷దղ Kernelized Stein Discrepancy ͷ࠷దղ  KSD

D(q,p) ʹ͓͚Δ ϕ ∈ Hd ʹؔ͢Δ࠷େԽ໰୊ͷղ͸ ϕ(⋅) = ϕ∗ q,p (⋅) / ∥ϕ∗ q,p ∥Hd where ϕ∗ q,p (⋅) = Ex∼q[∇x log p(x)k(x,⋅) + ∇xk(x,⋅)] ʹ͓͍ͯ༩͑ΒΕɼ͜ͷͱ͖ D(q,p) = ∥ϕ∗ q,p ∥Hd ূ໌͸࣍ϖʔδ͔Β 15
19. Kernelized Stein Discrepancy ͷ࠷దղͷಋग़ (1) D(q,p) = maxϕ∈Hd {Ex∼q[Apϕ(x)] s.t.

∥ϕ∥Hd ≤ 1} ূ໌ ؆୯ͷͨΊείΞؔ਺Λsp(x) = ∇x log p(x)ͱॻ͘ͱ Ex∼q[Apϕ(x)] = Ex∼q[Apϕ(x) − Aqϕ(x)] ∵ Stein’s Identity = Ex∼q[(sp(x) − sq(x)) ⋅ ϕ(x)] = d ∑ l=1 Ex∼q[(sl p (x) − sl q (x))⟨ϕl,k(x,⋅)⟩H] ∵ ࠶ੜੑ = d ∑ l=1 ⟨ϕl,Ex∼q[(sl p (x) − sl q (x))k(x,⋅)]⟩H. ಺ੵͷੑ࣭ΑΓɼ૯࿨಺ͷ֤߲͸ҎԼͰ࠷େԽ͞ΕΔɿ ϕl(⋅) = cEx∼q[(sl p (x) − sl q (x))k(x,⋅)] c ͸ਖ਼نԽఆ਺ 16
20. Kernelized Stein Discrepancy ͷ࠷దղͷಋग़ (2) ϕl(⋅) = cEx∼q[(sl p (x)

− sl q (x))k(x,⋅)] ϕl(⋅) = cEx∼q[sl p (x)k(x,⋅)+∇xl k(x,⋅)−(sl q (x)k(x,⋅) + ∇xl k(x,⋅))] ͷԼઢ෦͸ɼಉ࣌෼෍ͷظ଴஋ͷੑ࣭ͱ Stein’s Identity ΑΓ Ex∼q[sl q (x) + ∇xl k(x,⋅)] = Eq(x/l ) [Eq(xl;x/l ) [sl q (x)k(x,⋅) + ∇xl k(x,⋅)]] = Eq(x/l )  = 0. ͕ͨͬͯ͠ɼϕl(⋅) ΛϕΫτϧදࣔ͢Ε͹ ϕ(⋅) = cEx∼q[s(x)k(x,⋅) + ∇xk(x,⋅)] = cϕ∗ q,p (⋅) ͱͳΓɼ·ͨϊϧϜ੍໿ ∥ϕ∥Hd ≤ 1 ͔Β c = 1 / ∥ϕ∗ q,p ∥Hd . ࣍ʹ D(q,p) = ∥ϕ∗ q,p ∥Hd Λࣔ͢ 17
21. Kernelized Stein Discrepancy ͷ࠷దղͷಋग़ (3) x,x′ ∼ q ΛͦΕͧΕಠཱͳ֬཰ม਺ͱ͢Δͱɼ ∥ϕ∗

q,p ∥2 Hd = ⟨Ex∼q [s(x)k(x,⋅) + ∇x k(x,⋅)],Ex′∼q [s(x′)k(x′,⋅) + ∇x′ k(x′,⋅)]⟩ = Ex,x′∼q [⟨s(x)k(x,⋅) + ∇x k(x,⋅),s(x′)k(x′,⋅) + ∇x′ k(x′,⋅)⟩] = Ex,x′∼q [s(x)⊺⟨k(x,⋅),k(x′,⋅)⟩s(x′) + s(x)⊺⟨k(x,⋅),∇x′ k(x′,⋅)⟩ + ⟨∇x k(x,⋅),k(x′,⋅)⟩s(x′) + ⟨∇x k(x,⋅),∇x′ k(x′,⋅)⟩] = Ex,x′∼q [s(x)⊺k(x,x′)s(x′) + s(x)⊺∇x′ k(x,x′) + ∇x k(x,x′)⊺s(x′) + Tr(∇x,x′ k(x,x′))] ͕ಘΒΕΔɽҰํ ϕ(⋅) = ϕ∗ q,p (⋅) / ∥ϕ∗ q,p ∥Hd ΑΓ Ex∼q [Ap ϕ(x)] = Ex∼q [s(p)⊺ϕ∗ q,p (x) + ∇x ⋅ ϕ∗ q,p (x)] / ∥ϕ∗ q,p ∥Hd = Ex,x′∼q [s(x)⊺k(x,x′)s(x′) + s(x)⊺∇x′ k(x,x′) + ∇x k(x,x′)⊺s(x′) + Tr(∇x,x′ k(x,x′))] / ∥ϕ∗ q,p ∥Hd = ∥ϕ∗ q,p ∥Hd = D(q,p) ◻ 18
22. Kernelized Stein Discrepancy ͷ·ͱΊͱੑ࣭ KSD ͷ·ͱΊ p(x),q(x)ɿX ্ͷͳΊΒ͔ͳ֬཰෼෍ k(⋅,⋅) ∶

X × X → Rɿਖ਼ఆ஋Χʔωϧ ͜ͷͱ͖ KSD D(q,p) ͸ҎԼʹΑͬͯ༩͑ΒΕΔɿ D(q,p) = ∥ϕ∗ q,p ∥Hd ϕ∗ q,p (⋅) = Ex∼q[∇x log p(x)k(x,⋅) + ∇xk(x,⋅)] KSD ͷੑ࣭ X p = q ͷ৔߹ͷΈ D(q,p) = 0 ͱͳΔ X p(x) ͦͷ΋ͷͰͳ͘ ∇x log p(x) ʹ͔͠ґଘ͠ͳ͍ ▷ ਖ਼نԽఆ਺ Z ͸Θ͔Βͳͯ͘΋ OKʂ 19

24. KL μΠόʔδΣϯεͱม෼ਪ࿦ KL μΠόʔδΣϯε p(x),q(x)ɿX ্ͷͳΊΒ͔ͳ֬཰෼෍ KL μΠόʔδΣϯε KL(q∥p) ͸࣍ࣜͰ༩͑ΒΕΔɿ

KL(q∥p) = Ex∼q[log q(x) − log p(x)] = Ex∼q[log q(x) − log ¯ p(x) + log Z] KL μΠόʔδΣϯε͸֬཰෼෍ p,q ͷ “ဃ཭౓” Λද͍ͯͨ͠ ▷ ม෼ਪ࿦ɿਅͷࣄޙ෼෍ p ʹม෼ࣄޙ෼෍ q Λ͚ۙͮΔ 20
25. ֬཰ม਺ͷม׵ ֬཰ม਺ͷม׵ T ∶ X → Xɿ֬཰ม਺ʹର͢Δ 1 ର 1

͔ͭͳΊΒ͔ͳࣸ૾ q(x)ɿX ্ͷѻ͍΍͍֬͢཰෼෍ ֬཰ม਺ x ∼ q(x) Λ T Ͱม׵ͨ͠z = T (x)͸ q[T ] (z) = q(T −1(z))∣det(∇zT −1(z))∣ ʹ͕ͨ͠͏ ఏҊख๏ͷํࡦ ॳظ෼෍ q0 ͔Βੜ੒ཻͨ͠ࢠ {xi} ʹ࠶ؼతʹม਺ม׵Λ ࢪ͠ɼࣄޙ෼෍ p ʹ͚͍ۙͮͯ͘ 21
26. KL μΠόʔδΣϯεͱ Stein Operator ఆཧ 3.1 x ∼ q(x) ʹରͯ͠

T (x) = x + ϵϕ(x) ͱ͍͏ม׵Λߟ͑Δɽ ͜ͷͱ͖ z = T (x) ͕͕ͨ͠͏෼෍ q[T ] (z) ʹؔͯ͠ ∇ϵKL(q[T ] ∥p) ∣ ϵ=0 = −Ex∼q[Apϕ(x)] ͕੒Γཱͭ ิ୊ 3.2 ఆཧ 3.1ʹ͓͍ͯ ϕ ∈ {ϕ ∈ Hd ∣ ∥ϕ∥Hd ≤ D(q,p)} ΛԾఆɽ ͜ͷͱ͖ఆཧ 3.1ͷෛͷޯ഑͸ ϕ∗ q,p (⋅) ͷํ޲Ͱ࠷େʹͳΔ Ҏ্ΑΓɼKSD Λϕʔεʹม਺ม׵Λߟ͑Ε͹ OKʂ 22
27. ఆཧ 3.1 ͓Αͼิ୊ 3.2 ͷূ໌ (1) ఆཧ 3.1 ͷূ໌ ∇ϵKL(q[T

] ∥p) = ∇ϵKL(q∥p[T −1] ) = −Ex∼q[∇ϵ log p[T −1] (x)]. ͜͜Ͱp[T −1] = p(T (x))det(∇xT (x))ʹ஫ҙ͢Δͱɼ ∇ϵ log p[T −1] (x) = ∇ϵ log p(T (x)) + log det(∇xT (x)). ӈลୈೋ߲͸ log det ͷඍ෼ެࣜΑΓʢจݙ  ࣜ (43)ʣ log det(∇xT (x)) = Tr((∇xT (x))−1∇ϵ∇xT (x)). ӈลୈҰ߲͸ ∇ϵ log p(T (x)) = ∇ϵp(T (x)) / p(T (x)) = ∇T (x) p(T (x)) / p(T (x)) ⋅ ∇ϵT (x) = ∇zp(z) / p(z) ∣z=T (x) ⋅ ∇ϵT (x) = sp(T (x)) ⋅ ∇ϵT (x). 23
28. ఆཧ 3.1 ͓Αͼิ୊ 3.2 ͷূ໌ (2) ∇ϵ log p[T −1]

= sp (T (x)) ⋅ ∇ϵ T (x) + Tr((∇x T (x))−1∇ϵ ∇x T (x)) ͜͜Ͱ T (x) = x + ϵϕ(x), ϵ = 0 ͱ͢Δͱɼ T (x) = x, ∇ϵT (x) = ϕ(x), ∇xT (x) = I, ∇ϵ∇xT (x) = ∇xϕ(x). ͜ΕΒΛ୅ೖ͢Ε͹ɼ ∇ϵKL(q[T ] ∥p) = −Ex∼q[∇ϵ log p[T −1] (x)] = −Ex∼q[sp(T (x)) ⋅ ϕ(x) + ∇x ⋅ ϕ(x)] = −Ex∼q[Apϕ(x)] ◻ ิ୊ 3.2 ͸ఆཧ 3.1ͱKSD ͷ࠷దղ͔Βͨͩͪʹࣔ͞ΕΔ ◻ 24

Descent 1. ॳظ෼෍ q0(x) ͔Βཻࢠ {x0 i }n i=1 ⊂ X Λαϯϓϧ 2. m = 0 ͔Βద౰ͳճ਺࣍ͷखॱΛ܁Γฦ͢ 2.1 ֶश཰ ϵm ΛఆΊΔ 2.2 ҎԼΛܭࢉ͠ɼm ← m + 1 ͱ͢Δɿ xm+1 i ← xm i + ϵm ˆ ϕ ∗ q,p (xm i ) ˆ ϕ ∗ q,p (x) = 1 n n ∑ j=1 [∇xm j log p(xm j )k(x,xm j ) + ∇xm j k(x,xm j )] ∇xk(x,x) = 0 Λຬͨ͢೚ҙͷΧʔωϧʹ͍ͭͯɼཻࢠ਺ n = 1 Ͱ log p(x) ͷޯ഑ํ޲ʹߋ৽ ▷ MAP ղʹ޲͔͏ 25

31. ࣮ݧ 1. ࣮ݧઃఆ ໨ඪɿ1 ࣍ݩࠞ߹Ψ΢ε෼෍ͷۙࣅΛߦ͏ ϋΠύʔύϥϝʔλͷܾఆ X Χʔωϧؔ਺ k(⋅,⋅)ɿRBF Χʔωϧ

X RBF ΧʔωϧͷϋΠύʔύϥϝʔλ͸֤ΠςϨʔγϣ ϯ͝ͱʹ median heuristics ʹΑܾͬͯఆ X k(x,x′) = exp(∥x − x′∥2 2 /h) ʹର͠ h = median({xm i }n i=1 )2/log n X ∑j k(xm i ,xm j ) ≈ 1 ఔ౓ʹͳΓɼ๫ΕͮΒ͘ͳΔ X ֶश཰ ϵm ɿ AdaGrad ʹΑܾͬͯఆ 26
32. ࣮ݧ 1. 1 ࣍ݩࠞ߹Ψ΢ε෼෍ (1) ཻࢠ਺ n = 100 ੺఺ઢɿ໨తͷ෼෍

p(x)ɼ྘࣮ઢɿۙࣅ෼෍ q(x) p(x) = 1 3 N(−2,1) + 2 3 N(2,1) q0(x) = N(10,1) ໨తͷ p(x) ͱॳظ෼෍ q0(x) ͷΦʔόʔϥοϓ͕ͳͯ͘΋ OKʂ ▷ SIR ͳͲϦαϯϓϦϯάʹΑΔۙࣅ๏ͱͷେ͖ͳҧ͍ 27
33. ࣮ݧ 1. 1 ࣍ݩࠞ߹Ψ΢ε෼෍ (2) ཻࢠ਺Λม͑ͯظ଴஋ܭࢉͷਫ਼౓ΛϞϯςΧϧϩ๏ͱൺֱ p(x),q0(x)ɿઌ΄Ͳͱಉ༷ ॎ࣠ɿlog10 MSE ͷ

20 ճฏۉ (ω ∼ N(0,1), b ∼ Uniform([0,2π])) ԣ࣠ɿཻࢠ਺ʢαϯϓϧ਺ʣ 28
34. ࣮ݧ 2. ࣮ݧઃఆ ໨ඪɿϕΠδΞϯϩδεςΟ οΫճؼͷࣄޙ෼෍ΛٻΊΔ Covertype Dataset8 Λ࢖༻ʢσʔλ਺ 581,012, ࣍ݩ਺

54ʣ ϋΠύʔύϥϝʔλͷܾఆ X Χʔωϧؔ਺ k(⋅,⋅)ɿRBF Χʔωϧ X RBF ΧʔωϧͷϋΠύʔύϥϝʔλ͸ൺֱର৅ͷઌߦ ݚڀʹ߹ΘͤΔ X RBF ΧʔωϧͷϋΠύʔύϥϝʔλɿ h = 0.002 × median({xm i }n i=1 )2 X ֶश཰ɿϵm = a / (m + 1)0.55 a ͸ validation set ͔Βܾఆ X ֶशσʔλ 80%ɼςετσʔλ 20% X ϛχόοναΠζɿ50 29
35. ࣮ݧ 2. ϕΠδΞϯϩδεςΟ οΫճؼ ࠨਤɿཻࢠ਺ʢαϯϓϧ਺ʣn = 100 ʹ͓͚Δֶशۂઢ ӈਤɿm =

3000 ʹ͓͚Δཻࢠ਺ͱςετਫ਼౓ʢཻࢠϕʔεͷख๏ʣ ςετਫ਼౓ͷධՁ͸ͦΕͧΕ 50 ճͷฏۉ SVGD ͕Ұ൪Αͦ͞͏ 30
36. ࣮ݧ 3. ࣮ݧઃఆ ໨ඪɿϕΠδΞϯχϡʔϥϧωοτͷύϥϝʔλਪఆ ܭ 10 ݸͷσʔληοτͰ Probabilistic Back Propagation

ͱൺֱ ϋΠύʔύϥϝʔλͷܾఆ X Χʔωϧؔ਺ɿRBF Χʔωϧ with median heuristics X ֶश཰ɿϞϝϯλϜ͖ͭ AdaGrad X ӅΕ૚਺ɿ1ʢϢχοτ਺ 50ʣ X Protein σʔλͷΈӅΕ૚ͷϢχοτ਺ 100 X ׆ੑԽؔ਺ɿReLU X ֶशσʔλ 90%ɼςετσʔλ 10% X ϛχόοναΠζɿ100 X Year σʔλͷΈ 1000 31
37. ࣮ݧ 3. ϕΠδΞϯχϡʔϥϧωοτ RMSE ͱର਺໬౓ʢSVGD ͸ཻࢠ਺ n = 20ʣ ςετਫ਼౓ͷධՁ͸Ұ෦Λআ͖ͦΕͧΕ

20 ճฏۉ ▷ Protein ͸ 5 ճɼYear ͸ 1 ճ ܭࢉ࣌ؒ΋ੑೳ΋ SVGD ͕ڧ͍ 32
38. References i  ౉ล੅෉ɼϕΠζ౷ܭͷཧ࿦ͱํ๏ɼίϩφࣾ (2012)ɽ  Gong, W., Li, Y.,

and Hern´ andez-Lobato, J. M., “Sliced Kernelized Stein Discrepancy,” arXiv:2006.16531 (2020).  Gorham, J., and Mackey, L., “Measuring Sample Quality with Stein’s Method,” Advances in Neural Information Processing Systems (2015).  Liu, Q., Lee, J., and Jordan, M., “A Kernelized Stein Discrepancy for Goodness-of-ﬁt Tests,” in International Conference on Machine Learning (2016).
39. References ii  Chwialkowski, K., Strathmann, H., and Gretton, A.,

“A kernel test of goodness of ﬁt,” in International Conference on Machine Learning (2016).  ෱ਫ݈࣍ɼΧʔωϧ๏ೖ໳ –ਖ਼ఆ஋ΧʔωϧʹΑΔσʔλղ ੳ–ɼே૔ॻళ (2010)ɽ  ۚ৿ܟจɼ౷ܭతֶशཧ࿦ɼߨஊࣾ (2015)ɽ  Petersen, K., and M. S. Pedersen.m “The matrix cookbook,” Version November 15 (2012).