Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
440
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
980
Word and Document Embeddings
lmcinnes
0
130
Topological Data Analysis
lmcinnes
1
300
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
EarthSynth: Generating Informative Earth Observation with Diffusion Models
satai
3
170
投資戦略202508
pw
0
170
数理最適化に基づく制御
mickey_kubo
6
710
2025年度人工知能学会全国大会チュートリアル講演「深層基盤モデルの数理」
taiji_suzuki
24
18k
近似動的計画入門
mickey_kubo
4
1k
Trust No Bot? Forging Confidence in AI for Software Engineering
tomzimmermann
1
260
SSII2025 [TS1] 光学・物理原理に基づく深層画像生成
ssii
PRO
4
4.1k
When Submarine Cables Go Dark: Examining the Web Services Resilience Amid Global Internet Disruptions
irvin
0
280
Towards a More Efficient Reasoning LLM: AIMO2 Solution Summary and Introduction to Fast-Math Models
analokmaus
2
730
SSII2025 [SS2] 横浜DeNAベイスターズの躍進を支えたAIプロダクト
ssii
PRO
7
3.9k
電通総研の生成AI・エージェントの取り組みエンジニアリング業務向けAI活用事例紹介
isidaitc
1
850
Computational OT #1 - Monge and Kantorovitch
gpeyre
0
230
Featured
See All Featured
Statistics for Hackers
jakevdp
799
220k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
460
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Making the Leap to Tech Lead
cromwellryan
134
9.5k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
A designer walks into a library…
pauljervisheath
207
24k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop