Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
420
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
950
Word and Document Embeddings
lmcinnes
0
110
Topological Data Analysis
lmcinnes
1
280
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.4k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.4k
Other Decks in Research
See All in Research
SkySense : A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
satai
3
130
定性データ、どう活かす? 〜定性データのための分析基盤、はじめました〜 / How to utilize qualitative data? ~We have launched an analysis platform for qualitative data~
kaminashi
6
840
Self-supervised audiovisual representation learning for remote sensing data
satai
3
120
Weekly AI Agents News! 12月号 論文のアーカイブ
masatoto
0
290
Scale-Aware Recognition in Satellite images Under Resource Constraints
satai
3
170
Introduction of NII S. Koyama's Lab (AY2025)
skoyamalab
0
270
NeurIPS 2024 参加報告 & 論文紹介 (SACPO, Ctrl-G)
reisato12345
0
420
ことばの意味を計算するしくみ
verypluming
11
2.3k
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
satai
3
120
Principled AI ~深層学習時代における課題解決の方法論~
taniai
3
1.1k
サーブレシーブ成功率は勝敗に影響するか?
vball_panda
0
620
DeepSeek-R1の論文から読み解く背景技術
personabb
3
550
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.7k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.3k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
13
1.4k
Code Review Best Practice
trishagee
67
18k
Faster Mobile Websites
deanohume
306
31k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
178
53k
Music & Morning Musume
bryan
47
6.5k
The Cult of Friendly URLs
andyhume
78
6.3k
Building Adaptive Systems
keathley
41
2.5k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop