Slide 1

Slide 1 text

Title: Topic Modeling of Short Texts: A Pseudo-Document View Authors: Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, Hui Xiong @KDD 2016ษڧձ NOZAWA Kento (ஜ೾େM1/AIST RA)

Slide 2

Slide 2 text

LDAͷ՝୊: จॻ௕͕୹͍ͱτϐοΫͷֶशʹࣦഊ • ݪҼ: ڞى৘ใ͕े෼ʹಘΒΕͳ͍ͨΊ • ղܾࡦ: ڞى৘ใΛ૿΍͢Α͏ͳ޻෉ • จॻΛΫϥελϦϯάٖͨ͠ࣅจॻͰֶश ػցֶश -%" /-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ ࠓى͖ͨ ຾Εͳ͍ շ຾ປ short text ٖࣅจॻ ΫϥελϦϯά ࠓى͖ͨ ຾Εͳ͍ շ຾ປ ػցֶश -%"/-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ

Slide 3

Slide 3 text

D N K z ✓ w ↵ D N K P l ✓ z w ↵ Graphical models • K: τϐοΫ਺ • D: จॻ਺ • N: จॻ಺ͷ୯ޠ਺ • P: ٖࣅจॻ਺ େখؔ܎: K

Slide 4

Slide 4 text

D N K P l ✓ z w ↵ Generative process of PTM PTM • ٖࣅจॻ: จॻͷϋʔυΫϥελ • ٖࣅจॻ1ͭʹτϐοΫ෼෍θ͕ఆٛ ٖࣅจॻID

Slide 5

Slide 5 text

• 1จॻ͝ͱʹٖࣅจॻ  Λαϯϓϧ • ୯ޠ͝ͱʹτϐοΫ z Λαϯϓϧ ͸short textͩͱ΄΅0΍খ͍͞஋͔͠ͱΒͳ͍ ରͯ͠ ͸ରԠ͢ΔٖࣅจॻશମͰͷස౓͕࢖͑Δ Inference by collapsed Gibbs sampling Nz lds Nz ds l p(zs,i = z|rest) / (Nz ds + ↵)( N wds,i z + Nz + V ) (LDA) p(zs,i = z|rest) / (Nz lds + ↵)( N wds,i z + Nz + V ) (PTM)

Slide 6

Slide 6 text

• SPTM • Spike and Slab prior ΛٖࣅจॻͷτϐοΫ෼෍ʹ͍ΕΔ • EPTM • ෳ਺ͷٖࣅจॻʹଐͤΔ PTMΛ֦ுͨ͠Ϟσϧ΋ఏҊ

Slide 7

Slide 7 text

σʔληοτͱֶशύϥϝʔλ * Questions : தࠃޠ ύϥϝʔλ • τϐοΫ਺ K: 100 • ٖࣅจॻ਺P: 1000

Slide 8

Slide 8 text

࣮ݧ಺༰ 1. จॻ෼ྨ • ෇༩ͨ͠τϐοΫΛ΋ͱʹSVMͰ෼ྨ 2. UCI topic Coherence • wikipediaͷσʔλΛ࢖ͬͯܭࢉ • NewsͱDBLPͷΈ 3. ύϥϝʔλൺֱ • ٖࣅจॻ਺ɼֶशσʔλ਺ɼϞσϧൺֱ 4. ෇༩ͨ͠τϐοΫͷྫ • লུ

Slide 9

Slide 9 text

จॻ෼ྨ • จॻ਺͕খ͍͞(News)ͱSPTM > PTM • ֶशσʔλ͕গͳͯ͘΋F஋͸େ͖͘Լ͕Βͳ͍

Slide 10

Slide 10 text

UCI topic Coherence จॻ෼ྨͱಉ༷ɼจॻ਺͕খ͍͞ͱSPTM > PTM

Slide 11

Slide 11 text

• ٖࣅจॻ਺͸͋Δఔ౓େ͖͘͢Δඞཁ͕͋Δ • ٖࣅจॻ͕গͳ͍ͱSPTM͕Α͍ • NewsͰ΋ٖࣅจॻ਺Λ૿΍͢ͱ SPTM < PTMʹͳΔ ԣ࣠: ٖࣅจॻ਺ ύϥϝʔλൺֱ

Slide 12

Slide 12 text

ఏҊख๏ؒͰͷൺֱ • EPTM͕࠷΋ѱ͍

Slide 13

Slide 13 text

·ͱΊ short textͰ΋͏·ֶ͘श͢ΔτϐοΫϞσϧͷఏҊ • จॻू߹ΛΫϥελϦϯάٖͨ͠ࣅจॻΛજࡏม਺ʹಋೖ • 3ͭͷϞσϧΛఏҊ • σʔλ਺΍ٖࣅจॻ਺͕গͳ͍৔߹͸SPTM • ͦΕҎ֎͸PTM