Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
410
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
940
Word and Document Embeddings
lmcinnes
0
110
Topological Data Analysis
lmcinnes
1
270
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.4k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.3k
Other Decks in Research
See All in Research
CoRL2024サーベイ
rpc
1
1.5k
IM2024
mamoruk
0
230
ドローンやICTを活用した持続可能なまちづくりに関する研究
nro2daisuke
0
140
Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping
satai
3
130
Weekly AI Agents News! 10月号 論文のアーカイブ
masatoto
1
500
Data-centric AI勉強会 「ロボットにおけるData-centric AI」
haraduka
0
440
한국어 오픈소스 거대 언어 모델의 가능성: 새로운 시대의 언어 이해와 생성
inureyes
PRO
0
220
Leveraging LLMs for Unsupervised Dense Retriever Ranking (SIGIR 2024)
kampersanda
2
310
ソフトウェア研究における脅威モデリング
laysakura
0
1.6k
Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications
satai
3
150
国際会議ACL2024参加報告
chemical_tree
1
430
精度を無視しない推薦多様化の評価指標
kuri8ive
1
360
Featured
See All Featured
Building Adaptive Systems
keathley
40
2.4k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.3k
Thoughts on Productivity
jonyablonski
69
4.5k
Gamification - CAS2011
davidbonilla
80
5.1k
4 Signs Your Business is Dying
shpigford
182
22k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
Testing 201, or: Great Expectations
jmmastey
42
7.2k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
366
25k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
630
Designing for Performance
lara
604
68k
Code Reviewing Like a Champion
maltzj
521
39k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop