Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
500
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
第12回人と環境にやさしい交通をめざす全国大会/熊本都市圏「車1割削減、渋滞半減、公共交通2倍」をめざして
trafficbrain
0
100
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
7
1.9k
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
shunk031
4
960
AY 2026 Guide to Academic Writing Using Generative AI - Workshop
ks91
PRO
0
110
Ankylosing Spondylitis
ankh2054
0
170
定数整数除算・剰余算最適化再考
herumi
1
120
羽田新ルート運用6年の検証
1manken
0
160
NLP colloquium: AI Safety Survey
kanekomasahiro
0
460
【NICOGRAPH2025】Photographic Conviviality: ボディペイント・ワークショップによる 同時的かつ共生的な写真体験
toremolo72
0
240
LOSの検討(λ Kansai 2026 in Winter)
motopu
0
130
COFFEE-Japan PROJECT Impact Report(海ノ向こうコーヒー)
ontheslope
0
1.7k
Scalable dynamic origin-destination demand estimation enhanced by high-resolution satellite imagery data
satai
2
220
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
96
14k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
820
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
160
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
How to build a perfect <img>
jonoalderson
1
5.5k
The Curse of the Amulet
leimatthew05
1
13k
4 Signs Your Business is Dying
shpigford
187
22k
The Invisible Side of Design
smashingmag
302
52k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
150
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
930
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop