Slide 1

Slide 1 text

Interactive LDA Quentin Pleple [email protected] May 3, 2013

Slide 2

Slide 2 text

Latent Dirichlet Allocation LDA discovers topics into a collection of documents. obama budweiser baseball speech pizza words ϕk Topic k LDA tags each document with topics. politics sports news economy war topics θd Document d

Slide 3

Slide 3 text

Model based on a generative process for topic k = 1, ..., K do Draw a word-distribution ϕk ∼ Dir(β) for document d = 1, ..., D do Draw a topic-distribution θd ∼ Dir(α) for position i = 1, ..., N in document d do Draw a topic k ∼ Discrete(θd ) Draw a word wdi ∼ Discrete(ϕk )

Slide 4

Slide 4 text

Inference Recover model parameters: topics ϕk tagging θd of documents But because: Completely unsupervised Function optimized = human judgment We get junk topics. Live supervision

Slide 5

Slide 5 text

Interactive LDA Repeat: Run LDA until convergence Ask user for feedback Move the current state Repeatedly use live user feedback to kick LDA out its local optimum

Slide 6

Slide 6 text

Previous work [Andrzejewski et. al, 2009] [Hu et. al, 2013] Gibbs sampling Encode user feedback (must link, cannot link) into a complex prior Keep running Gibbs sampler

Slide 7

Slide 7 text

Variational EM for LDA Posterior p(ϕk , θd | corpus) intractable Approximate it using Variational EM Instead of having ϕk ∼ Dir(β) Break dependencies ϕk ∼ Dir(λk ) λkw count(word w has been drawn from topic k)

Slide 8

Slide 8 text

Example Corpus = “Pixel Cafe.” “The Pixel Cafe is a weekly seminar.” λ = ← λ1 ← λ2 pixel 2 0 cafe 2 0 weekly 0 1 seminar 0 1 In reality (smoothing 0.01) λ = 1.81 0.21 1.91 0.11 0.11 0.91 0.21 0.81

Slide 9

Slide 9 text

Variational EM for LDA Initialize topics λ randomly Repeat until convergence: E step. tag documents with topics keeping topics λ fixed M step. update topics λ according to assignments in E step We can modify topics λ between each epoch.

Slide 10

Slide 10 text

Remove words from topics Given a word w and a topic k λ = 8 3 7 5 5 0 ← row k w Procedure: λkw ← 0

Slide 11

Slide 11 text

Delete topics Given topic k λ = 8 3 5 7 5 ← row k Procedure: Remove row k in λ Number of topics K ← K − 1

Slide 12

Slide 12 text

Merge topics Given two topics k1 and k2 λ = ← row k1 ← row k2 Procedure: λk1 ← λk1 + λk2 Delete topic k2

Slide 13

Slide 13 text

Split topics Given set of words W1 and W2 and topic k1 λ = 1 3 5 7 5 2 1 1 8 4 ← row k1 ← row k2 move fraction ρw Procedure: Add a row k2 in λ For each word w: λk2 w = ρw λk1 w λk1 w = (1 − ρw ) λk1 w with ρw = p(w|W2 ) i p(w|Wi ) Number of topics K ← K + 1

Slide 14

Slide 14 text

Can we do this? Is the algorithm still correct? Yes. No assumption about λ between epochs. Is the model still optimal? Yes. After each user update, LDA converges again. Variational EM Initialize λ randomly Repeat: E step tag docs M step update λ Interactive LDA Repeat: λ = EMepochs(λ) User updates λ

Slide 15

Slide 15 text

demo

Slide 16

Slide 16 text

Conclusion Repeatedly use live user feedback to kick LDA out its local optimum Questions?