Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
450
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
140
Topological Data Analysis
lmcinnes
1
320
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.5k
Other Decks in Research
See All in Research
ドメイン知識がない領域での自然言語処理の始め方
hargon24
1
160
When Learned Data Structures Meet Computer Vision
matsui_528
1
670
Combining Deep Learning and Street View Imagery to Map Smallholder Crop Types
satai
3
200
情報技術の社会実装に向けた応用と課題:ニュースメディアの事例から / appmech-jsce 2025
upura
0
260
言語モデルの地図:確率分布と情報幾何による類似性の可視化
shimosan
8
2.1k
投資戦略202508
pw
0
570
Time to Cash: The Full Stack Breakdown of Modern ATM Attacks
ratatata
0
170
CVPR2025論文紹介:Unboxed
murakawatakuya
0
210
日本語新聞記事を用いた大規模言語モデルの暗記定量化 / LLMC2025
upura
0
340
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
310
Stealing LUKS Keys via TPM and UUID Spoofing in 10 Minutes - BSides 2025
anykeyshik
0
160
AIグラフィックデザインの進化:断片から統合(One Piece)へ / From Fragment to One Piece: A Survey on AI-Driven Graphic Design
shunk031
0
560
Featured
See All Featured
Navigating Team Friction
lara
190
16k
Visualization
eitanlees
150
16k
Writing Fast Ruby
sferik
630
62k
How to train your dragon (web standard)
notwaldorf
97
6.4k
Bash Introduction
62gerente
615
210k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Designing Experiences People Love
moore
142
24k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
2.9k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop