Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
480
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
340
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
19
9.5k
姫路市 -都市OSの「再実装」-
hopin
0
1.6k
Akamaiのキャッシュ効率を支えるAdaptSizeについての論文を読んでみた
bootjp
1
440
Time to Cash: The Full Stack Breakdown of Modern ATM Attacks
ratatata
0
190
Remote sensing × Multi-modal meta survey
satai
4
710
教師あり学習と強化学習で作る 最強の数学特化LLM
analokmaus
2
880
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
160
データサイエンティストの業務変化
datascientistsociety
PRO
0
210
OWASP KansaiDAY 2025.09_文系OSINTハンズオン
owaspkansai
0
110
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
satai
3
490
Collective Predictive Coding and World Models in LLMs: A System 0/1/2/3 Perspective on Hierarchical Physical AI (IEEE SII 2026 Plenary Talk)
tanichu
1
240
製造業主導型経済からサービス経済化における中間層形成メカニズムのパラダイムシフト
yamotty
0
480
Featured
See All Featured
The untapped power of vector embeddings
frankvandijk
1
1.6k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
How to make the Groovebox
asonas
2
1.9k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
290
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
50
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
290
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
170
Abbi's Birthday
coloredviolet
1
4.7k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.2k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
110
Documentation Writing (for coders)
carmenintech
77
5.2k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop