Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
440
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
970
Word and Document Embeddings
lmcinnes
0
130
Topological Data Analysis
lmcinnes
1
300
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization
satai
3
270
2025年度 生成AIの使い方/接し方
hkefka385
1
740
20250725-bet-ai-day
cipepser
2
300
なめらかなシステムと運用維持の終わらぬ未来 / dicomo2025_coherently_fittable_system
monochromegane
0
1.3k
NLP Colloquium
junokim
1
170
時系列データに対する解釈可能な 決定木クラスタリング
mickey_kubo
2
810
Principled AI ~深層学習時代における課題解決の方法論~
taniai
3
1.2k
【緊急警告】日本の未来設計図 ~沈没か、再生か。国民と断行するラストチャンス~
yuutakasan
0
140
公立高校入試等に対する受入保留アルゴリズム(DA)導入の提言
shunyanoda
0
6.3k
IMC の細かすぎる話 2025
smly
2
430
Trust No Bot? Forging Confidence in AI for Software Engineering
tomzimmermann
1
250
When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions
irvin
0
260
Featured
See All Featured
Statistics for Hackers
jakevdp
799
220k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Git: the NoSQL Database
bkeepers
PRO
431
65k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.7k
Rails Girls Zürich Keynote
gr2m
95
14k
Six Lessons from altMBA
skipperchong
28
3.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
KATA
mclloyd
30
14k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.5k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop