Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ベイズ深層学習(6.3)

catla
March 27, 2020

 ベイズ深層学習(6.3)

ベイズ深層学習 6.3節 生成ネットワークの構造学習

catla

March 27, 2020
Tweet

More Decks by catla

Other Decks in Science

Transcript

  1. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹ·ͣɼ ͷόΠφϦߦྻ Λߟ͑ɼ ͱͳΔ৔߹ͷ ͷੜ੒աఔΛߏங͢ Δɽ·ͨɼ֤ཁૉ ͸ϕϧψʔΠ෼෍ ͔Βੜ੒͞ΕΔͱ͢Δɽ͞Β

    ʹϋΠύʔύϥϝʔλ Λ༻͍ͯɼύϥϝʔλ ͕ϕʔλ෼෍  ͔Βੜ੒͞Ε͍ͯΔͱͨ͠Βɼߦྻ ͷ෼෍͸࣍ͷεϥΠυͷΑ͏ʹॻ͚Δɽ N × H M H → ∞ M mn,h ∈ {0,1} Bern(πh ) α > 0,β > 0 πh Beta(αβ/H, β) M     p(πh ) = Beta(αβ/H, β) = Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 p(mn,h |πh ) = Bern(πh ) = πmn,h h (1 − πh )1−mn,h πh α β mn,h n = 1,2,…, N h = 1,2,…, H
  2. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ    ͱ͢Δͱ ͱͳΓɼશͯͷόΠφϦߦྻͷੜ੒֬཰͕ʹͳͬͯ͠· ͏ɽ p(M) =

    H ∏ h=1 ∫ p(πh ) { N ∏ n=1 p(mn,h |πh ) } dπh = H ∏ h=1 ∫ Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 { N ∏ n=1 πmn,h h (1 − πh )1−mn,h } dπh = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) ∫ πNh + αβ H −1 h (1 − πh )N−Nh +β−1dπh ( ∵ Nh = N ∑ n=1 mn,h ) = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) H → ∞ p(M) → 0 ( ∵ Beta(x, y) = Γ(x)Γ(y) Γ(x + y) < 1 where x > 1,y > 1)
  3. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒֬཰͕ʹͳΔ͜ͱΛ๷͙ͨΊʹɼ ͷྻΛฒͼସ͑Δʢlofʣ͜ͱͰಉ͡ʹͳΔ Α͏ͳߦྻͷಉ஋ྨΛ ͱ͓͘ɽ ྫɿ ͷͱ͖ɼ  

    ɹ ʹରͯ͠ ͱͨ͠ͱ͖ͷ෼෍ͷܭࢉ͸จݙ<>ΑΓɼ࣍ͷεϥΠυͷΑ͏ ʹॻ͚Δɽ M [M] M = ( 1 0 0 0 1 0 0 0 1 ) [M] ∈ ( 1 0 0 0 1 0 0 0 1 ) , ( 0 1 0 1 0 0 0 0 1 ) , ( 0 0 1 0 1 0 1 0 0 ) , ( 1 0 0 0 0 1 0 1 0 ) p([M]) H → ∞ [1] “Infinite Latent Feature Models and the Indian Buffet Process”, 2018
  4. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹɹɹɹɹɹɹ  ͨͩ͠ɼ ͸ ʹ͋ΔόΠφϦྻʢྻϕΫτϧ͕ಉ͡ͳΒ΋ಉ͡ʣͷݸ਺Ͱɼಉ ͡όΠφϦྻͷฒͼସ͑ʹΑΔॏෳΛΩϟϯηϧ͢ΔͨΊʹׂΔɽ·ͨɼ ͸ 

    ͱͳΔΑ͏ͳྻ ͷݸ਺ɽ ͸ ͷظ଴஋ɽɹɹɹɹɹɹɹ ɹ͜ͷ෼෍͸ ͷߦΛަ׵ͯ͠΋มΘΒͳ͍ͷͰɹަ׵ՄೳੑɹΛ࣋ͭɽ p([M]) = ∑ M∈[M] p(M) = H! ∏ i≥1 Hi ! H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) → (αβ)H+ ∏ i≥1 Hi ! exp(− ¯ H+ ) H+ ∏ h=1 Γ (Nh) Γ(N − Nh + β) Γ (β + N) Hi M i i i H+ Nh > 0 h ¯ H+ = α N ∑ n=1 β n + β − 1 H+ M
  5. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒͞ΕΔόΠφϦߦྻ ͷಛੑ w٬ਓ͋ͨΓͷྉཧͷ਺͸ ʹै͏ɽ wͱΒΕΔྉཧͷ૯਺ͷظ଴஋͸  w٬ʹऔΒΕΔྉཧͷछྨͷ߹ܭ͸ 

    w  શһ͕ಉ͡ྉཧΛબͿ  w  ٬ಉ͕࢜ಉ͡ྉཧΛબ͹ͳ͘ͳΔ M ∈ {0,1}N×∞ Poi(α) Nα ¯ H+ = α N ∑ n=1 β n + β − 1 lim β→0 ¯ H+ = α lim β→∞ ¯ H+ = Nα
  6. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Poi(α) Poi

    ( αβ 2 + β − 1) Poi ( αβ 3 + β − 1) Poi ( αβ 4 + β − 1 ) Poi ( αβ 5 + β − 1) Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 ग़యɿ8JLJQFEJB ϙΞιϯ෼෍
  7. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Nh n

    + β − 1  ͳͷͰɼ ֬཰ ͰʹͳΔɽ N2 = 3 3 4 + β − 1 h = 1 h = 2 h = 3
  8. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹσʔλ Λແݶ࣍ݩͷજࡏม਺ ͱόΠφϦߦྻ ʹΑͬͯɼੜ੒ϞσϧΛϞσϧԽ ͢ΔͱҎԼͷΑ͏ʹͳΔɽ   ɹ

    ͕ ͷΑ͏ʹղੳతʹੵ෼আڈͰ͖ΔͱԾఆͨ͠৔߹ɼ ΪϒεαϯϓϦϯάʹΑͬͯࣄޙ෼෍ ͔Β֤ Λ࣍ͷΑ͏ʹαϯϓϦϯάͰ ͖Δɽ   ɹ ͔Β ͕αϯϓϦϯά͞ΕΔ֬཰͸ɼΠϯυྉཧաఔʹ͓͍ͯ  ਓ͕ྉཧΛऔͬͨޙʹ࠷ޙͷ ൪໨ͷ٬͕ ൪໨ͷྉཧΛͱΔ͜ͱʹରԠ͍ͯ͠ Δɽ X θ M p(X, M, θ) = p(X|M, θ)p(M)p(θ) θ p(X|M) = ∫ p(X|M, θ)p(θ)dθ p(M|X) mn,h p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) ) p(mn,h |M\(n,h) ) mn,h = 1 n − 1 n h
  9. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  10. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  11. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10  ൪໨ʹདྷͨ٬͕ ൪໨ ͷྉཧΛऔΔ֬཰ n h  ൪໨ʹདྷͨ٬͕৽͍͠ ྉཧΛऔΔ֬཰ n p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  12. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ɹจݙ<>ʹج͖ͮɼඇઢܗΨ΢ε৴೦ωοτϫʔΫʢOPOMJOFBS(BVTTJBOCFMJFG OFUXPSLʣͱ͍͏ੜ੒ϞσϧΛ࢖ͬͯ%//Λߏ੒͢Δɽ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ ૚ͷωοτϫʔΫΛߟ͑Δɽ ɹ ɿ૚໨ͷϢχοτ਺ɽ ɹ ɿ૚໨ͷ

    ൪໨ͷϢχοτɽ ɹ ɿ ྡ઀ߦྻʢʹશ݁߹૚ʹର͢ΔϚεΫʣɽཁૉ ͸ɼ ͔Β ʹ໼ҹ͕ଘࡏ͢Δ͜ͱ Λҙຯ͢Δɽ ɹ ɿ૚໨ͷॏΈύϥϝʔλɽ ɹ ɿ૚໨ͷόΠΞεύϥϝʔλɽ ɹ ɿ૚໨ͷ׆ੑɽ ͱ͓͘ɽ L Hl l z(l) h l h M(l) ∈ ℝHl−1 ×Hl m(l) h,h′ = 1 z(l) h′ z(l−1) h W(l) ∈ ℝHl−1 ×Hl l b ∈ ℝHl l a(l) l [2] “Learning the Structure of Deep Sparse Graphical Models”, 2010
  13. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ׆ੑ͸ҎԼͷΑ͏ʹॻ͚Δɽ   ͞Βʹ׆ੑ ʹ͸Ψ΢ε෼෍͔ΒͷϊΠζ͕ՃΘΔͱ͢Δɽ   ӅΕϢχοτ

    ͸ҎԼͷΑ͏ʹม׵͞Ε͍ͯΔͱ͢Δɽ   a(l) = (W(l+1) ⊙ M(l+1))z(l+1) + b(l) a(l) h ˜ a(l) h = a(l) h + ϵ ϵ ∼ (0,ν(l) h ) z(l) h z(l) h = ϕ( ˜ a(l) h ) ϕ( ⋅ ) = Tanh( ⋅ )
  14. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ͢Δͱɼ ͷ෼෍͸ҎԼͷΑ͏ʹٻΊΒΕΔɽʢ֬཰ม਺ͷม਺ม׵ʣ     ɹ·ͨɼ ͱ

    ͸Ψ΢εࣄલ෼෍ɼ ʹ͸ΨϯϚࣄલ෼෍Λ༩͑Δɽʢڞ໾ࣄલ ෼෍ʣ w  ͕খ͍͞ɹˠ஋ͷۃ୺ͳ஋ΛͱΔɽ w  ͕େ͖͍ɹˠ΄΅ܾఆతͳ஋ΛͱΔɽ ͞Βʹɼ ͱ͢Δɽ z(l) h p(z(l) h |a(l) h , ν(l) h ) = (ϕ−1(z(l) h )|a(l) h , ν(l) h ) ∂ ˜ a(l) h ∂z(l) h = N(ϕ−1(z(l) h )|a(l) h , ν(l) h ) ϕ′(ϕ−1(z(l) h )) ϕ′(a) = d da ϕ(a) W(l) h b(l) ν(l) h ν(l) h ν(l) h z(0) h = xh ∈ (−1,1)
  15. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ ɹ͖ͬ͞ͷಉ࣌෼෍ʹؔͯ͠ɼ૚਺΍ӅΕϢχοτ਺ʹࣗ༝౓Λ࣋ͨͤΔΑ͏ʹ֦ு͢ Δɽ ɹ·ͣɼ Λߟ͑Δɽߦ਺͸ ͷ࣍ݩ਺ ͰݻఆͳͷͰɼΠϯυྉཧաఔͷ٬਺ͱΈ ͳ͢ɽ࣍ʹ Λߟ͑Δͱɼߦ਺͸ΠϯυྉཧաఔʹΑͬͯಘΒΕͨ

    ͷྻ਺ ʹ ݻఆ͢Δඞཁ͕͋Γɼ͜Ε΋ΠϯυྉཧաఔͰαϯϓϦϯάͰ͖Δɽ͜ΕΛ܁Γฦ࣮͠ ߦͯ͠ɼ ɼ ɼ ɼ ͷΑ͏ʹαϯϓϦϯά͢Δ͜ͱͰωοτϫʔΫΛߏஙͰ ͖Δɽ M(1) x H0 M(2) M(1) H1 M(1) M(2) M(3) …  x ∈ ℝH0  M(1) ∈ ℝH0 ×H1  M(2) ∈ ℝH1 ×H2  M(3) ∈ ℝH2 ×H3 IBP IBP IBP IBP
  16. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ 2ɽͲͷΑ͏ʹಉ࣌෼෍Λਪ࿦͢Δ͔ɽ "ɹ.$.$Ͱۙࣅతʹਪ࿦Ͱ͖Δɽ ʢ۩ମྫʣӅΕϢχοτͷू߹ ɼόΠφϦߦྻͷू߹ ɼύϥϝʔλͷू߹  ͷͭͷϒϩοΫʹ෼͚ͯɼΪϒεαϯϓϦϯάʹجͮ͘ަޓαϯϓϦϯάΛߦ͑Δɽ Z

    M {W, b, ν} Z ∼ p(Z|X, M, W, b, ν) M ∼ p(M|X, Z, W, b, ν) W, b, ν ∼ p(W, b, ν|X, Z, M) zn ∼ p(zn |xn , M, W, b, ν) mn,h ∼ p(mn,h |X, Z, W, b, ν, M\(n,h) ) W ∼ p(W|X, Z, M, b, ν) b ∼ p(W, b, ν|X, Z, M, W, ν) ν ∼ p(W, b, ν|X, Z, M, W, b)