catla
March 27, 2020
200

ベイズ深層学習(6.3)

ベイズ深層学習　6.3節　生成ネットワークの構造学習

March 27, 2020

Transcript

4. Πϯυྉཧաఔ Πϯυྉཧաఔ ແݶͷ਺ྻΛ࣋ͭόΠφϦߦྻΛੜ੒͢Δ֬཰Ϟσϧɽ ʲഎܠʳ ɹਂ૚ֶशͷϞσϧʹ͓͍ͯɼੑೳͷྑ͍ωοτϫʔΫߏ଄ΛܾΊΔ͜ͱ͸೉͍͠ɽ  ΠϯυྉཧաఔΛ࢖͏͜ͱͰɼσʔλ͔Β༗ޮάϥϑͷߏ଄ʢྡ઀ߦྻʣͷਪఆ͕ Մೳɽ ɹ ɹωοτϫʔΫͷ෯΍ਂ͞΋ϕΠζਪ࿦ͷ࿮૊ΈͰಉֶ࣌श͕ՄೳͱͳΔɽ·ͨɼΠϯ

υྉཧաఔΛԠ༻͢Δͱɼજࡏม਺ͷ࣍ݩ਺ͳͲ΋ࣗಈܾఆͰ͖Δɽ ⟹
5. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹ·ͣɼ ͷόΠφϦߦྻ Λߟ͑ɼ ͱͳΔ৔߹ͷ ͷੜ੒աఔΛߏங͢ Δɽ·ͨɼ֤ཁૉ ͸ϕϧψʔΠ෼෍ ͔Βੜ੒͞ΕΔͱ͢Δɽ͞Β

ʹϋΠύʔύϥϝʔλ Λ༻͍ͯɼύϥϝʔλ ͕ϕʔλ෼෍  ͔Βੜ੒͞Ε͍ͯΔͱͨ͠Βɼߦྻ ͷ෼෍͸࣍ͷεϥΠυͷΑ͏ʹॻ͚Δɽ N × H M H → ∞ M mn,h ∈ {0,1} Bern(πh ) α > 0,β > 0 πh Beta(αβ/H, β) M     p(πh ) = Beta(αβ/H, β) = Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 p(mn,h |πh ) = Bern(πh ) = πmn,h h (1 − πh )1−mn,h πh α β mn,h n = 1,2,…, N h = 1,2,…, H
6. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ    ͱ͢Δͱ ͱͳΓɼશͯͷόΠφϦߦྻͷੜ੒֬཰͕ʹͳͬͯ͠· ͏ɽ p(M) =

H ∏ h=1 ∫ p(πh ) { N ∏ n=1 p(mn,h |πh ) } dπh = H ∏ h=1 ∫ Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 { N ∏ n=1 πmn,h h (1 − πh )1−mn,h } dπh = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) ∫ πNh + αβ H −1 h (1 − πh )N−Nh +β−1dπh ( ∵ Nh = N ∑ n=1 mn,h ) = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) H → ∞ p(M) → 0 ( ∵ Beta(x, y) = Γ(x)Γ(y) Γ(x + y) < 1 where x > 1,y > 1)
7. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒֬཰͕ʹͳΔ͜ͱΛ๷͙ͨΊʹɼ ͷྻΛฒͼସ͑Δʢlofʣ͜ͱͰಉ͡ʹͳΔ Α͏ͳߦྻͷಉ஋ྨΛ ͱ͓͘ɽ ྫɿ ͷͱ͖ɼ  

ɹ ʹରͯ͠ ͱͨ͠ͱ͖ͷ෼෍ͷܭࢉ͸จݙ<>ΑΓɼ࣍ͷεϥΠυͷΑ͏ ʹॻ͚Δɽ M [M] M = ( 1 0 0 0 1 0 0 0 1 ) [M] ∈ ( 1 0 0 0 1 0 0 0 1 ) , ( 0 1 0 1 0 0 0 0 1 ) , ( 0 0 1 0 1 0 1 0 0 ) , ( 1 0 0 0 0 1 0 1 0 ) p([M]) H → ∞ [1] “Inﬁnite Latent Feature Models and the Indian Buffet Process”, 2018
8. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹɹɹɹɹɹɹ  ͨͩ͠ɼ ͸ ʹ͋ΔόΠφϦྻʢྻϕΫτϧ͕ಉ͡ͳΒ΋ಉ͡ʣͷݸ਺Ͱɼಉ ͡όΠφϦྻͷฒͼସ͑ʹΑΔॏෳΛΩϟϯηϧ͢ΔͨΊʹׂΔɽ·ͨɼ ͸ 

ͱͳΔΑ͏ͳྻ ͷݸ਺ɽ ͸ ͷظ଴஋ɽɹɹɹɹɹɹɹ ɹ͜ͷ෼෍͸ ͷߦΛަ׵ͯ͠΋มΘΒͳ͍ͷͰɹަ׵ՄೳੑɹΛ࣋ͭɽ p([M]) = ∑ M∈[M] p(M) = H! ∏ i≥1 Hi ! H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) → (αβ)H+ ∏ i≥1 Hi ! exp(− ¯ H+ ) H+ ∏ h=1 Γ (Nh) Γ(N − Nh + β) Γ (β + N) Hi M i i i H+ Nh > 0 h ¯ H+ = α N ∑ n=1 β n + β − 1 H+ M

6 i
12. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹແݶྻΛ࣋ͭόΠφϦߦྻͷੜ੒͸ɼΠϯυྉཧաఔͷखଓ͖ʹΑͬͯɼ࣍ͷΑ͏ʹ ߦ͏͜ͱ͕Ͱ͖Δɽ ɹ࠷ॳʹྉཧళʹདྷͨ٬͸ɼҎԼͷϙΞιϯ෼෍ʹैͬͯྉཧΛͱΔɽ   ɹO൪໨ʹདྷͨ٬͸ɼ֬཰ ʹै֤ͬͯྉཧ ΛͱΓɼ࠷ޙʹ

 ʹैͬͯ৽͍͠ྉཧΛͱΔɽ Poi(α) = αx x! e−α Nh n + β − 1 h Poi ( αβ n + β − 1) ΠϯυྉཧաఔʢIBPʣͷखଓ͖
13. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒͞ΕΔόΠφϦߦྻ ͷಛੑ w٬ਓ͋ͨΓͷྉཧͷ਺͸ ʹै͏ɽ wͱΒΕΔྉཧͷ૯਺ͷظ଴஋͸  w٬ʹऔΒΕΔྉཧͷछྨͷ߹ܭ͸ 

w  શһ͕ಉ͡ྉཧΛબͿ  w  ٬ಉ͕࢜ಉ͡ྉཧΛબ͹ͳ͘ͳΔ M ∈ {0,1}N×∞ Poi(α) Nα ¯ H+ = α N ∑ n=1 β n + β − 1 lim β→0 ¯ H+ = α lim β→∞ ¯ H+ = Nα
14. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Poi(α) Poi

( αβ 2 + β − 1) Poi ( αβ 3 + β − 1) Poi ( αβ 4 + β − 1 ) Poi ( αβ 5 + β − 1) Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 ग़యɿ8JLJQFEJB ϙΞιϯ෼෍
15. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Nh n

+ β − 1  ͳͷͰɼ ֬཰ ͰʹͳΔɽ N2 = 3 3 4 + β − 1 h = 1 h = 2 h = 3
16. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹσʔλ Λແݶ࣍ݩͷજࡏม਺ ͱόΠφϦߦྻ ʹΑͬͯɼੜ੒ϞσϧΛϞσϧԽ ͢ΔͱҎԼͷΑ͏ʹͳΔɽ   ɹ

͕ ͷΑ͏ʹղੳతʹੵ෼আڈͰ͖ΔͱԾఆͨ͠৔߹ɼ ΪϒεαϯϓϦϯάʹΑͬͯࣄޙ෼෍ ͔Β֤ Λ࣍ͷΑ͏ʹαϯϓϦϯάͰ ͖Δɽ   ɹ ͔Β ͕αϯϓϦϯά͞ΕΔ֬཰͸ɼΠϯυྉཧաఔʹ͓͍ͯ  ਓ͕ྉཧΛऔͬͨޙʹ࠷ޙͷ ൪໨ͷ٬͕ ൪໨ͷྉཧΛͱΔ͜ͱʹରԠ͍ͯ͠ Δɽ X θ M p(X, M, θ) = p(X|M, θ)p(M)p(θ) θ p(X|M) = ∫ p(X|M, θ)p(θ)dθ p(M|X) mn,h p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) ) p(mn,h |M\(n,h) ) mn,h = 1 n − 1 n h
17. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
18. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
19. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10  ൪໨ʹདྷͨ٬͕ ൪໨ ͷྉཧΛऔΔ֬཰ n h  ൪໨ʹདྷͨ٬͕৽͍͠ ྉཧΛऔΔ֬཰ n p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
20. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ɹจݙ<>ʹج͖ͮɼඇઢܗΨ΢ε৴೦ωοτϫʔΫʢOPOMJOFBS(BVTTJBOCFMJFG OFUXPSLʣͱ͍͏ੜ੒ϞσϧΛ࢖ͬͯ%//Λߏ੒͢Δɽ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ ૚ͷωοτϫʔΫΛߟ͑Δɽ ɹ ɿ૚໨ͷϢχοτ਺ɽ ɹ ɿ૚໨ͷ

൪໨ͷϢχοτɽ ɹ ɿ ྡ઀ߦྻʢʹશ݁߹૚ʹର͢ΔϚεΫʣɽཁૉ ͸ɼ ͔Β ʹ໼ҹ͕ଘࡏ͢Δ͜ͱ Λҙຯ͢Δɽ ɹ ɿ૚໨ͷॏΈύϥϝʔλɽ ɹ ɿ૚໨ͷόΠΞεύϥϝʔλɽ ɹ ɿ૚໨ͷ׆ੑɽ ͱ͓͘ɽ L Hl l z(l) h l h M(l) ∈ ℝHl−1 ×Hl m(l) h,h′ = 1 z(l) h′ z(l−1) h W(l) ∈ ℝHl−1 ×Hl l b ∈ ℝHl l a(l) l [2] “Learning the Structure of Deep Sparse Graphical Models”, 2010
21. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ׆ੑ͸ҎԼͷΑ͏ʹॻ͚Δɽ   ͞Βʹ׆ੑ ʹ͸Ψ΢ε෼෍͔ΒͷϊΠζ͕ՃΘΔͱ͢Δɽ   ӅΕϢχοτ

͸ҎԼͷΑ͏ʹม׵͞Ε͍ͯΔͱ͢Δɽ   a(l) = (W(l+1) ⊙ M(l+1))z(l+1) + b(l) a(l) h ˜ a(l) h = a(l) h + ϵ ϵ ∼ (0,ν(l) h ) z(l) h z(l) h = ϕ( ˜ a(l) h ) ϕ( ⋅ ) = Tanh( ⋅ )
22. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ͢Δͱɼ ͷ෼෍͸ҎԼͷΑ͏ʹٻΊΒΕΔɽʢ֬཰ม਺ͷม਺ม׵ʣ     ɹ·ͨɼ ͱ

͸Ψ΢εࣄલ෼෍ɼ ʹ͸ΨϯϚࣄલ෼෍Λ༩͑Δɽʢڞ໾ࣄલ ෼෍ʣ w  ͕খ͍͞ɹˠ஋ͷۃ୺ͳ஋ΛͱΔɽ w  ͕େ͖͍ɹˠ΄΅ܾఆతͳ஋ΛͱΔɽ ͞Βʹɼ ͱ͢Δɽ z(l) h p(z(l) h |a(l) h , ν(l) h ) = (ϕ−1(z(l) h )|a(l) h , ν(l) h ) ∂ ˜ a(l) h ∂z(l) h = N(ϕ−1(z(l) h )|a(l) h , ν(l) h ) ϕ′(ϕ−1(z(l) h )) ϕ′(a) = d da ϕ(a) W(l) h b(l) ν(l) h ν(l) h ν(l) h z(0) h = xh ∈ (−1,1)
23. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹϞσϧશମͷಉ࣌෼෍͸ҎԼͷΑ͏ʹॻ͚Δ  p(X, Z, M, W, b, ν)

= p(W)p(b)p(ν)p(M) N ∏ n=1 p(xn |zn , M, W, b, ν)p(zn |M, W, b, ν)
24. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ ɹ͖ͬ͞ͷಉ࣌෼෍ʹؔͯ͠ɼ૚਺΍ӅΕϢχοτ਺ʹࣗ༝౓Λ࣋ͨͤΔΑ͏ʹ֦ு͢ Δɽ ɹ·ͣɼ Λߟ͑Δɽߦ਺͸ ͷ࣍ݩ਺ ͰݻఆͳͷͰɼΠϯυྉཧաఔͷ٬਺ͱΈ ͳ͢ɽ࣍ʹ Λߟ͑Δͱɼߦ਺͸ΠϯυྉཧաఔʹΑͬͯಘΒΕͨ

ͷྻ਺ ʹ ݻఆ͢Δඞཁ͕͋Γɼ͜Ε΋ΠϯυྉཧաఔͰαϯϓϦϯάͰ͖Δɽ͜ΕΛ܁Γฦ࣮͠ ߦͯ͠ɼ ɼ ɼ ɼ ͷΑ͏ʹαϯϓϦϯά͢Δ͜ͱͰωοτϫʔΫΛߏஙͰ ͖Δɽ M(1) x H0 M(2) M(1) H1 M(1) M(2) M(3) …  x ∈ ℝH0  M(1) ∈ ℝH0 ×H1  M(2) ∈ ℝH1 ×H2  M(3) ∈ ℝH2 ×H3 IBP IBP IBP IBP

26. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ 2ɽͲͷΑ͏ʹಉ࣌෼෍Λਪ࿦͢Δ͔ɽ "ɹ.\$.\$Ͱۙࣅతʹਪ࿦Ͱ͖Δɽ ʢ۩ମྫʣӅΕϢχοτͷू߹ ɼόΠφϦߦྻͷू߹ ɼύϥϝʔλͷू߹  ͷͭͷϒϩοΫʹ෼͚ͯɼΪϒεαϯϓϦϯάʹجͮ͘ަޓαϯϓϦϯάΛߦ͑Δɽ Z

M {W, b, ν} Z ∼ p(Z|X, M, W, b, ν) M ∼ p(M|X, Z, W, b, ν) W, b, ν ∼ p(W, b, ν|X, Z, M) zn ∼ p(zn |xn , M, W, b, ν) mn,h ∼ p(mn,h |X, Z, W, b, ν, M\(n,h) ) W ∼ p(W|X, Z, M, b, ν) b ∼ p(W, b, ν|X, Z, M, W, ν) ν ∼ p(W, b, ν|X, Z, M, W, b)
27. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ ɹ ͷαϯϓϧ͕༩͑ΒΕ͍ͯΔঢ়گͰ͸ɼωοτϫʔΫ͕ݻఆ͞Ε͍ͯΔͷͰɼҎલ ΍ͬͨΑ͏ͳ௨ৗͷੜ੒ωοτϫʔΫͷਪ࿦໰୊ʹؼணɽ ɹ ͷαϯϓϧ͕༩͑ΒΕ͍ͯΔঢ়گͰ͸ɼ ͷࣄલ෼෍͸ڞ໾ࣄલ෼෍͔Βબ ୒͍ͯ͠ΔͷͰɼ ͷαϯϓϦϯά͸༰қʹՄೳɽ

ɹҰํͰɼ ΍ ͷαϯϓϦϯά͸ෳࡶɽϝτϩϙϦεɾϔΠεςΟϯάε๏Λ࢖͏͜ ͱͰ࣮ࢪͰ͖Δɽ M Z {W, b, ν} {W, b, ν} Z M