Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
400
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
910
Word and Document Embeddings
lmcinnes
0
100
Topological Data Analysis
lmcinnes
1
250
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.4k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.2k
Other Decks in Research
See All in Research
移動ビッグデータに基づく地理情報の埋め込みベクトル化
tam1110
0
170
QGISハンズオン事に質問のあったProjectのGeoPackageへの保存方法についての、補足の資料です。
wata909
0
100
MetricSifter:クラウドアプリケーションにおける故障箇所特定の効率化のための多変量時系列データの特徴量削減 / FIT 2024
yuukit
2
140
テキストマイニングことはじめー基本的な考え方からメディアディスコース研究への応用まで
langstat
1
150
20240918 交通くまもとーく 未来の鉄道網編(こねくま)
trafficbrain
0
350
Large Vision Language Model (LVLM) に関する最新知見まとめ (Part 1)
onely7
22
4.9k
ECCV2024読み会: Minimalist Vision with Freeform Pixels
hsmtta
1
310
marukotenant01/tenant-20240826
marketing2024
0
520
[依頼講演] 適応的実験計画法に基づく効率的無線システム設計
k_sato
0
180
EBPMにおける生成AI活用について
daimoriwaki
0
220
Leveraging LLMs for Unsupervised Dense Retriever Ranking (SIGIR 2024)
kampersanda
2
260
Practical The One Person Framework
asonas
1
1.8k
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
How STYLIGHT went responsive
nonsquared
96
5.2k
BBQ
matthewcrist
85
9.4k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Embracing the Ebb and Flow
colly
84
4.5k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
A Modern Web Designer's Workflow
chriscoyier
693
190k
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.6k
A Philosophy of Restraint
colly
203
16k
Facilitating Awesome Meetings
lara
50
6.1k
Navigating Team Friction
lara
183
15k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.5k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop