Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
470
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
330
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.6k
Other Decks in Research
See All in Research
それ、チームの改善になってますか?ー「チームとは?」から始めた組織の実験ー
hirakawa51
0
280
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
4
500
20251023_くまもと21の会例会_「車1割削減、渋滞半減、公共交通2倍」をめざして.pdf
trafficbrain
0
160
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
19
9.3k
さまざまなAgent FrameworkとAIエージェントの評価
ymd65536
1
390
AIスパコン「さくらONE」のLLM学習ベンチマークによる性能評価 / SAKURAONE LLM Training Benchmarking
yuukit
2
940
ウェブ・ソーシャルメディア論文読み会 第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)
hkefka385
0
120
Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping
satai
3
540
Open Gateway 5GC利用への期待と不安
stellarcraft
2
170
Agentic AI Era におけるサプライチェーン最適化
mickey_kubo
0
110
Language Models Are Implicitly Continuous
eumesy
PRO
0
370
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification
satai
3
450
Featured
See All Featured
Discover your Explorer Soul
emna__ayadi
2
1k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
330
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
180
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
410
Prompt Engineering for Job Search
mfonobong
0
140
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
39
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
270
Context Engineering - Making Every Token Count
addyosmani
9
590
Six Lessons from altMBA
skipperchong
29
4.1k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop