Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Ensemble Topic Modelling Leland McInnes
Slide 2
Slide 2 text
Model a corpus of documents in terms of underlying “topics”
Slide 3
Slide 3 text
Topic Modelling as Matrix Factorization
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
LDA and pLSA are probabilistic matrix factorization methods
Slide 9
Slide 9 text
(Ensembles of) pLSA
Slide 10
Slide 10 text
Performance?
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
Quality?
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Instability?
Slide 15
Slide 15 text
These are hard optimization problems
Slide 16
Slide 16 text
Topics vary from one run to another
Slide 17
Slide 17 text
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
Each cluster represents a stable topic
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
• Greater stability • Determines number of topics automatically • Embarrassingly parallel computation
Slide 22
Slide 22 text
Implementation
Slide 23
Slide 23 text
sklearn API
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
https://github.com/lmcinnes/enstop
Slide 26
Slide 26 text
pip install enstop