Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
440
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
990
Word and Document Embeddings
lmcinnes
0
130
Topological Data Analysis
lmcinnes
1
310
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
[RSJ25] Enhancing VLA Performance in Understanding and Executing Free-form Instructions via Visual Prompt-based Paraphrasing
keio_smilab
PRO
0
100
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
190
国際論文を出そう!ICRA / IROS / RA-L への論文投稿の心構えとノウハウ / RSJ2025 Luncheon Seminar
koide3
6
4.5k
Adaptive Experimental Design for Efficient Average Treatment Effect Estimation and Treatment Choice
masakat0
0
160
Agentic AIとMCPを利用したサービス作成入門
mickey_kubo
0
530
引力・斥力を制御可能なランダム部分集合の確率分布
wasyro
0
240
能動適応的実験計画
masakat0
2
810
Language Models Are Implicitly Continuous
eumesy
PRO
0
230
SSII2025 [TS3] 医工連携における画像情報学研究
ssii
PRO
2
1.3k
集合間Bregmanダイバージェンスと置換不変NNによるその学習
wasyro
0
140
【緊急警告】日本の未来設計図 ~沈没か、再生か。国民と断行するラストチャンス~
yuutakasan
0
150
心理言語学の視点から再考する言語モデルの学習過程
chemical_tree
2
580
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
4 Signs Your Business is Dying
shpigford
184
22k
It's Worth the Effort
3n
187
28k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The Cult of Friendly URLs
andyhume
79
6.6k
Optimizing for Happiness
mojombo
379
70k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
Automating Front-end Workflow
addyosmani
1370
200k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
920
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop