Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
480
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
340
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
20251023_くまもと21の会例会_「車1割削減、渋滞半減、公共交通2倍」をめざして.pdf
trafficbrain
0
180
Pythonでジオを使い倒そう! 〜それとFOSS4G Hiroshima 2026のご紹介を少し〜
wata909
0
1.3k
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
580
Time to Cash: The Full Stack Breakdown of Modern ATM Attacks
ratatata
0
190
2026.01ウェビナー資料
elith
0
190
ACL読み会2025: Can Language Models Reason about Individualistic Human Values and Preferences?
yukizenimoto
0
120
ペットのかわいい瞬間を撮影する オートシャッターAIアプリへの スマートラベリングの適用
mssmkmr
0
230
ローテーション別のサイドアウト戦略 ~なぜあのローテは回らないのか?~
vball_panda
0
280
データサイエンティストをめぐる環境の違い2025年版〈一般ビジネスパーソン調査の国際比較〉
datascientistsociety
PRO
0
660
教師あり学習と強化学習で作る 最強の数学特化LLM
analokmaus
2
880
2025-11-21-DA-10th-satellite
yegusa
0
110
"主観で終わらせない"定性データ活用 ― プロダクトディスカバリーを加速させるインサイトマネジメント / Utilizing qualitative data that "doesn't end with subjectivity" - Insight management that accelerates product discovery
kaminashi
15
20k
Featured
See All Featured
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
63
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
117
110k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
170
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
97
Design in an AI World
tapps
0
140
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
200
A designer walks into a library…
pauljervisheath
210
24k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.3k
AI: The stuff that nobody shows you
jnunemaker
PRO
2
240
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
89
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop