Slide 1

Slide 1 text

Autoencoding Variational Inference For Topic Models Akash Srivastava and Charles Sutton ICLR2017ಡΈձ ಡΉਓ: @nzw0301

Slide 2

Slide 2 text

֓ཁ 1. Latent Dirichlet Allocation (LDA) ΛNeural Variational Inference (NVI) Ͱ • Dirichlet ෼෍ͷ reparameterization trick 2. ৽ϞσϧͷఏҊ 3. ѱ͍ہॴղʹϋϚΔͷΛ༧๷ 2

Slide 3

Slide 3 text

ࣄલ஌ࣝɿLDAͱVAEͷ֓ཁ 3

Slide 4

Slide 4 text

LDA จॻͷ֬཰తੜ੒Ϟσϧ [Blei et al., 2003]      จॻͷτϐοΫ෼෍Q [cВ ݚڀ ՝୊ ஌ࣝ Պֶऀ ʜ ػցֶश ਓ޻஌ೳ Ϟσϧ αϯϓϧ ʜ τϐοΫͷ୯ޠ෼෍ p(w|β) Ќ Ќ ػցֶश ػցֶशݚڀ ਓ޻஌ೳ՝୊ Ϟσϧ-%" Պֶ਺ֶण࢘ ίʔύε 4

Slide 5

Slide 5 text

VAE: Encoder • NNΛ࢖ͬͨੜ੒Ϟσϧ • Encoder: • σʔλ͔Β֬཰෼෍ͷύϥϝʔλ΁ͷม׵ • ֬཰෼෍͸જࡏม਺Λੜ੒ • Decoder: • જࡏม਺͔Βσʔλੜ੒ • Reparameterization trick • BPʹαϯϓϧΛؚΊΔ޻෉ • ඪ४ਖ਼ن෼෍ͷαϯϓϧͱ෼෍ͷ
 ύϥϝʔλ͔ΒαϯϓϧΛߏ੒ 5

Slide 6

Slide 6 text

VAE: Decoder • NNΛ࢖ͬͨੜ੒Ϟσϧ • Encoder: • σʔλ͔Β֬཰෼෍ͷύϥϝʔλ΁ͷม׵ • ֬཰෼෍͸જࡏม਺Λੜ੒ • Decoder: • જࡏม਺͔Βσʔλੜ੒ • Reparameterization trick • BPʹαϯϓϧΛؚΊΔ޻෉ • ඪ४ਖ਼ن෼෍ͷαϯϓϧͱ෼෍ͷ
 ύϥϝʔλ͔ΒαϯϓϧΛߏ੒ 6

Slide 7

Slide 7 text

VAE: Reparameterization trick • NNΛ࢖ͬͨੜ੒Ϟσϧ • Encoder: • σʔλ͔Β֬཰෼෍ͷύϥϝʔλ΁ͷม׵ • ֬཰෼෍͸જࡏม਺Λੜ੒ • Decoder: • જࡏม਺͔Βσʔλੜ੒ • Reparameterization trick • BPʹαϯϓϧΛؚΊΔ޻෉ • ඪ४ਖ਼ن෼෍ͷαϯϓϧͱ෼෍ͷ
 ύϥϝʔλ͔ΒαϯϓϧΛߏ੒ 7

Slide 8

Slide 8 text

VAE: ϩεؔ਺ 8 L (⇥) = D X d=1 ( 1 2 ⇣ tr (⌃0) + µT 0 µ0 K log | ⌃0 | ⌘ + E ✏⇠N (0,1) ⇣ log p xd |f ( µ0 + ⌃ 1/2 0 ✏ ) ⌘ ) (Ⅰ) ࣄલ෼෍ͱͷKLμΠόʔδΣϯε (Ⅱ) ର਺໬౓ ࣜશମ: Evidence Lower Bound (I) (Ⅱ)

Slide 9

Slide 9 text

ຊ୊ 9

Slide 10

Slide 10 text

Reparameterization trick for Dirichlet Distribution • LDAͷθ: Dirichlet෼෍͔Βαϯϓϧ • Scale family DistributionͰͳ͍ͨΊɼߏ੒Ͱ͖ͳ͍ 10      จॻͷτϐοΫ෼෍Q [cВ

Slide 11

Slide 11 text

Reparameterization trick for Dirichlet Distribution • LDAͷθ: Dirichlet෼෍͔Βαϯϓϧ • Scale family DistributionͰͳ͍ͨΊɼߏ੒Ͱ͖ͳ͍ • Laplace approximation • ਖ਼ن෼෍ͷαϯϓϧʹsoftmaxؔ਺Λద༻ͯ͠୅༻ • ࣄલ෼෍ͷύϥϝʔλɿ µk = log( ↵k) 1 K K X i=1 log ↵i ⌃k,k = 1 ↵k (1 2 K ) + 1 K2 K X i=1 1 ↵k 11

Slide 12

Slide 12 text

ωοτϫʔΫͱϩεؔ਺ 12 X encoder µ( X ) ⌃ ( X ) KL {N( z ; µ( X ) , ⌃ ( X ))||N( z ; µ1, ⌃1)} ✏ ⇠ N(✏; 0, I ) + decoder: f ( Z ) loss ( x, f ( Z )) • σ: softmaxؔ਺ • β : DecoderͷॏΈʢunnormalizedʣ • σ(β): ୯ޠͷDiriclet෼෍͔ΒͷαϯϓϧʹରԠ L ( ⇥ ) = D X d=1 ( 1 2 ⇣ tr ( ⌃ 1 1 ⌃0) + ( µ1 µ0) T ⌃ 1 1 ( µ1 µ0) K + log |⌃1 | |⌃0 | ⌘ + E ✏⇠N (0,1) wt d log ⇣ ( µ0 + ⌃1/2 0 ✏ ) ⌘ !) θ ස౓ϕΫτϧ

Slide 13

Slide 13 text

prodLDA: ఏҊϞσϧ • Products of Experts • βͱθͷੵʹsoftmaxؔ਺ 13 L ( ⇥ ) = D X d=1 ( 1 2 ⇣ tr ( ⌃ 1 1 ⌃0) + ( µ1 µ0) T ⌃ 1 1 ( µ1 µ0) K + log |⌃1 | |⌃0 | ⌘ + E ✏⇠N (0,1) wt d log ⇣ ( µ0 + ⌃1/2 0 ✏ ) ⌘ !) ( ✓)

Slide 14

Slide 14 text

࠷దԽͱωοτϫʔΫͷ޻෉ NVIͷ໰୊఺ɿ ֶशͷॳظஈ֊Ͱlocal optimumʹߦ͖΍͍͢ • AdamͷύϥϝʔλΛௐ੔ • ηͱβ1 ͷ஋ͷߴΊʹઃఆ • Batch NormalizationͱDropoutΛ࢖༻ 14

Slide 15

Slide 15 text

࣮ݧ 1. CoherenceͱPerplexity • ޙड़ 2. ֶश཰ͱࣄલ෼෍Λม͑ͨͱ͖ͷޮՌ • ߴֶ͍श཰ & Dirichlet෼෍͕ϕλʔ 3. ςετσʔλʹର͢Δ࠷దԽͷ༗ແ • ͠ͳͯ͘΋͍͍ 4. p(w|β)ͷϦετ • লུ 15

Slide 16

Slide 16 text

Coherence 16 ද͸౰࿦จ͔ΒҾ༻ • LDA VAE: ఏҊਪ࿦๏ • prodLDA: ఏҊਪ࿦๏+ఏҊϞσϧ • LDA DMFVI: Online Mean-Field Variational Inference • NVDM: VAEϕʔεͷจॻϞσϦϯά දͷ஋: 40ճ࣮ߦͯ͠ࢉग़

Slide 17

Slide 17 text

Perplexity 17 ද͸౰࿦จ͔ΒҾ༻

Slide 18

Slide 18 text

ϨϏϡʔ: ؾʹͳͬͨ΋ͷΛ͍͔ͭ͘ Q1. NVDMͰ΋adamͷֶश཰Λม͑ͨํ͕ެฏ A1. ࿦จʹ൓ө Q2. ϋΠύʔύϥϝʔλ͸࠷దԽ͔ͨ͠ A2. ൺֱख๏͸͍ͯ͠ΔɼఏҊख๏͸BO Rating: 6-7-6-5 18

Slide 19

Slide 19 text

ͦͷଞ • ஶऀ࣮૷: TensorFlow • NVDMͷஶऀΒͷ৽Ϟσϧ͕ICML2017ʹ࠾࿥ 19