Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
400
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
890
Word and Document Embeddings
lmcinnes
0
99
Topological Data Analysis
lmcinnes
1
240
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.3k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.2k
Other Decks in Research
See All in Research
20240918 交通くまもとーく 未来の鉄道網編(太田恒平)
trafficbrain
0
230
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
de9uch1
0
120
3次元点群の分類における評価指標について
kentaitakura
0
420
Weekly AI Agents News! 8月号 論文のアーカイブ
masatoto
1
180
snlp2024_multiheadMoE
takase
0
430
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
eumesy
PRO
7
1.2k
Isotropy, Clusters, and Classifiers
hpprc
3
630
精度を無視しない推薦多様化の評価指標
kuri8ive
1
240
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
240
いしかわ暮らしセミナー~移住にまつわるお金の話~
matyuda
0
150
LLM時代にLabは何をすべきか聞いて回った1年間
hargon24
1
490
機械学習でヒトの行動を変える
hiromu1996
1
300
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
6.8k
Building Adaptive Systems
keathley
38
2.3k
4 Signs Your Business is Dying
shpigford
180
21k
What's in a price? How to price your products and services
michaelherold
243
12k
Practical Orchestrator
shlominoach
186
10k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Become a Pro
speakerdeck
PRO
25
5k
A Tale of Four Properties
chriscoyier
156
23k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop