Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
490
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
120
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
170
LINEヤフー データサイエンス Meetup「三井物産コモディティ予測チャレンジ」の舞台裏-AlpacaTechパート
gamella
0
300
進学校の生徒にはア行の苗字が多いのか
ozekinote
0
320
Aurora Serverless からAurora Serverless v2への課題と知見を論文から読み解く/Understanding the challenges and insights of moving from Aurora Serverless to Aurora Serverless v2 from a paper
bootjp
6
1.6k
2026.01ウェビナー資料
elith
0
340
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
1k
SREはサイバネティクスの夢をみるか? / Do SREs Dream of Cybernetics?
yuukit
3
470
はじまりの クエスチョンブック —余暇と豊かさにあふれた社会とは?
culturaltransition
PRO
0
350
Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)
morishtr
1
200
通時的な類似度行列に基づく単語の意味変化の分析
rudorudo11
0
250
論文紹介 "ReSim: Reliable World Simulation for Autonomous Driving"
kogo
0
120
Featured
See All Featured
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
270
The Limits of Empathy - UXLibs8
cassininazir
1
290
WENDY [Excerpt]
tessaabrams
9
37k
Navigating Weather and Climate Data
rabernat
0
160
The World Runs on Bad Software
bkeepers
PRO
72
12k
Deep Space Network (abreviated)
tonyrice
0
110
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The SEO Collaboration Effect
kristinabergwall1
0
420
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Rails Girls Zürich Keynote
gr2m
96
14k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
93
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop