Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
410
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
930
Word and Document Embeddings
lmcinnes
0
110
Topological Data Analysis
lmcinnes
1
270
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.4k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.2k
Other Decks in Research
See All in Research
CVPR2024 参加報告
kwchrk
0
160
メタヒューリスティクスに基づく汎用線形整数計画ソルバーの開発
snowberryfield
3
700
Weekly AI Agents News! 10月号 論文のアーカイブ
masatoto
1
470
Segment Any Change
satai
2
110
ECCV2024読み会: Minimalist Vision with Freeform Pixels
hsmtta
1
370
医療支援AI開発における臨床と情報学の連携を円滑に進めるために
moda0
0
140
AIトップカンファレンスからみるData-Centric AIの研究動向 / Research Trends in Data-Centric AI: Insights from Top AI Conferences
tsurubee
3
850
精度を無視しない推薦多様化の評価指標
kuri8ive
1
350
KDD論文読み会2024: False Positive in A/B Tests
ryotoitoi
0
270
PetiteSRE_GenAIEraにおけるインフラのあり方観察
ichichi
0
250
LLM 시대의 Compliance: Safety & Security
huffon
0
520
【NLPコロキウム】Stepwise Alignment for Constrained Language Model Policy Optimization (NeurIPS 2024)
akifumi_wachi
3
480
Featured
See All Featured
Statistics for Hackers
jakevdp
797
220k
Building Your Own Lightsaber
phodgson
104
6.2k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
27
1.5k
A Philosophy of Restraint
colly
203
16k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
174
51k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
30
2.1k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
510
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
Gamification - CAS2011
davidbonilla
80
5.1k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
28
4.5k
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop