Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Topic Modeling of Short Texts: A Pseudo-Document View

2ab3dc02a9448f246bab64174b19dc1e?s=47 Kento Nozawa
September 02, 2016

Topic Modeling of Short Texts: A Pseudo-Document View

2ab3dc02a9448f246bab64174b19dc1e?s=128

Kento Nozawa

September 02, 2016
Tweet

More Decks by Kento Nozawa

Other Decks in Research

Transcript

  1. Title: Topic Modeling of Short Texts: A Pseudo-Document View Authors:

    Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, Hui Xiong @KDD 2016ษڧձ NOZAWA Kento (ஜ೾େM1/AIST RA)
  2. LDAͷ՝୊: จॻ௕͕୹͍ͱτϐοΫͷֶशʹࣦഊ • ݪҼ: ڞى৘ใ͕े෼ʹಘΒΕͳ͍ͨΊ • ղܾࡦ: ڞى৘ใΛ૿΍͢Α͏ͳ޻෉ • จॻΛΫϥελϦϯάٖͨ͠ࣅจॻͰֶश

    ػցֶश -%" /-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ ࠓى͖ͨ ຾Εͳ͍ շ຾ປ short text ٖࣅจॻ ΫϥελϦϯά ࠓى͖ͨ ຾Εͳ͍ շ຾ປ ػցֶश -%"/-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ
  3. D N K z ✓ w ↵ D N K

    P l ✓ z w ↵ Graphical models • K: τϐοΫ਺ • D: จॻ਺ • N: จॻ಺ͷ୯ޠ਺ • P: ٖࣅจॻ਺ େখؔ܎: K<P<<D LDA [Blei+, 2003] PTM (ఏҊ๏)
  4. D N K P l ✓ z w ↵ Generative

    process of PTM PTM • ٖࣅจॻ: จॻͷϋʔυΫϥελ • ٖࣅจॻ1ͭʹτϐοΫ෼෍θ͕ఆٛ ٖࣅจॻID
  5. • 1จॻ͝ͱʹٖࣅจॻ  Λαϯϓϧ • ୯ޠ͝ͱʹτϐοΫ z Λαϯϓϧ ͸short textͩͱ΄΅0΍খ͍͞஋͔͠ͱΒͳ͍ ରͯ͠

    ͸ରԠ͢ΔٖࣅจॻશମͰͷස౓͕࢖͑Δ Inference by collapsed Gibbs sampling Nz lds Nz ds l p(zs,i = z|rest) / (Nz ds + ↵)( N wds,i z + Nz + V ) (LDA) p(zs,i = z|rest) / (Nz lds + ↵)( N wds,i z + Nz + V ) (PTM)
  6. • SPTM • Spike and Slab prior ΛٖࣅจॻͷτϐοΫ෼෍ʹ͍ΕΔ • EPTM

    • ෳ਺ͷٖࣅจॻʹଐͤΔ PTMΛ֦ுͨ͠Ϟσϧ΋ఏҊ
  7. σʔληοτͱֶशύϥϝʔλ * Questions : தࠃޠ ύϥϝʔλ • τϐοΫ਺ K: 100

    • ٖࣅจॻ਺P: 1000
  8. ࣮ݧ಺༰ 1. จॻ෼ྨ • ෇༩ͨ͠τϐοΫΛ΋ͱʹSVMͰ෼ྨ 2. UCI topic Coherence •

    wikipediaͷσʔλΛ࢖ͬͯܭࢉ • NewsͱDBLPͷΈ 3. ύϥϝʔλൺֱ • ٖࣅจॻ਺ɼֶशσʔλ਺ɼϞσϧൺֱ 4. ෇༩ͨ͠τϐοΫͷྫ • লུ
  9. จॻ෼ྨ • จॻ਺͕খ͍͞(News)ͱSPTM > PTM • ֶशσʔλ͕গͳͯ͘΋F஋͸େ͖͘Լ͕Βͳ͍

  10. UCI topic Coherence จॻ෼ྨͱಉ༷ɼจॻ਺͕খ͍͞ͱSPTM > PTM

  11. • ٖࣅจॻ਺͸͋Δఔ౓େ͖͘͢Δඞཁ͕͋Δ • ٖࣅจॻ͕গͳ͍ͱSPTM͕Α͍ • NewsͰ΋ٖࣅจॻ਺Λ૿΍͢ͱ SPTM < PTMʹͳΔ ԣ࣠:

    ٖࣅจॻ਺ ύϥϝʔλൺֱ
  12. ఏҊख๏ؒͰͷൺֱ • EPTM͕࠷΋ѱ͍

  13. ·ͱΊ short textͰ΋͏·ֶ͘श͢ΔτϐοΫϞσϧͷఏҊ • จॻू߹ΛΫϥελϦϯάٖͨ͠ࣅจॻΛજࡏม਺ʹಋೖ • 3ͭͷϞσϧΛఏҊ • σʔλ਺΍ٖࣅจॻ਺͕গͳ͍৔߹͸SPTM •

    ͦΕҎ֎͸PTM