Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
450
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
150
Topological Data Analysis
lmcinnes
1
330
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.6k
Other Decks in Research
See All in Research
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1k
Remote sensing × Multi-modal meta survey
satai
4
630
一人称視点映像解析の最先端(MIRU2025 チュートリアル)
takumayagi
6
4.4k
GPUを利用したStein Particle Filterによる点群6自由度モンテカルロSLAM
takuminakao
0
620
AIスパコン「さくらONE」の オブザーバビリティ / Observability for AI Supercomputer SAKURAONE
yuukit
2
1k
思いつきが武器になる:研究というゲームを始めよう / Ideas Are Your Equipments : Let the Game of Research Begin!
ks91
PRO
0
100
LLM-Assisted Semantic Guidance for Sparsely Annotated Remote Sensing Object Detection
satai
3
140
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
63
35k
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
satai
4
490
"主観で終わらせない"定性データ活用 ― プロダクトディスカバリーを加速させるインサイトマネジメント / Utilizing qualitative data that "doesn't end with subjectivity" - Insight management that accelerates product discovery
kaminashi
15
15k
大規模言語モデルにおけるData-Centric AIと合成データの活用 / Data-Centric AI and Synthetic Data in Large Language Models
tsurubee
1
440
[論文紹介] Intuitive Fine-Tuning
ryou0634
0
150
Featured
See All Featured
Site-Speed That Sticks
csswizardry
13
1k
Bash Introduction
62gerente
615
210k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The Invisible Side of Design
smashingmag
302
51k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
It's Worth the Effort
3n
187
29k
Raft: Consensus for Rubyists
vanstee
141
7.2k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
The Cult of Friendly URLs
andyhume
79
6.7k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
730
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop