Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
430
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
960
Word and Document Embeddings
lmcinnes
0
120
Topological Data Analysis
lmcinnes
1
290
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.5k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.4k
Other Decks in Research
See All in Research
Vision And Languageモデルにおける異なるドメインでの継続事前学習が性能に与える影響の検証 / YANS2024
sansan_randd
1
100
生成的推薦の人気バイアスの分析:暗記の観点から / JSAI2025
upura
0
150
LLM-as-a-Judge: 文章をLLMで評価する@教育機関DXシンポ
k141303
3
800
大規模日本語VLM Asagi-VLMにおける合成データセットの構築とモデル実装
kuehara
5
2.3k
SSII2025 [TS2] リモートセンシング画像処理の最前線
ssii
PRO
7
2.7k
言語モデルによるAI創薬の進展 / Advancements in AI-Driven Drug Discovery Using Language Models
tsurubee
2
360
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
240
RapidPen: AIエージェントによるペネトレーションテスト 初期侵入全自動化の研究
laysakura
0
1.4k
定性データ、どう活かす? 〜定性データのための分析基盤、はじめました〜 / How to utilize qualitative data? ~We have launched an analysis platform for qualitative data~
kaminashi
6
1k
(NULLCON Goa 2025)Windows Keylogger Detection: Targeting Past and Present Keylogging Techniques
asuna_jp
1
510
業界横断 副業・兼業者の実態調査
fkske
0
120
ノンパラメトリック分布表現を用いた位置尤度場周辺化によるRTK-GNSSの整数アンビギュイティ推定
aoki_nosse
0
310
Featured
See All Featured
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Raft: Consensus for Rubyists
vanstee
139
7k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
VelocityConf: Rendering Performance Case Studies
addyosmani
329
24k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
Stop Working from a Prison Cell
hatefulcrawdad
269
20k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop