Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
410
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
940
Word and Document Embeddings
lmcinnes
0
110
Topological Data Analysis
lmcinnes
1
270
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.4k
A Guide to Dimension Reduction
lmcinnes
3
1.3k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.3k
Other Decks in Research
See All in Research
書き手はどこを訪れたか? - 言語モデルで訪問行動を読み取る -
hiroki13
0
140
PostgreSQLにおける分散トレーシングの現在 - 第50回PostgreSQLアンカンファレンス
seinoyu
0
230
2038年問題が思ったよりヤバい。検出ツールを作って脅威性評価してみた論文 | Kansai Open Forum 2024
ran350
8
3.8k
医療支援AI開発における臨床と情報学の連携を円滑に進めるために
moda0
0
150
The Economics of Platforms 輪読会 第1章
tomonatu8
0
140
o1 pro mode の調査レポート
smorce
0
110
Whoisの闇
hirachan
3
290
Zipf 白色化:タイプとトークンの区別がもたらす良質な埋め込み空間と損失関数
eumesy
PRO
8
1.3k
Retrieval of Hurricane Rain Rate From SAR Images Based on Artificial Neural Network
satai
3
140
ラムダ計算の拡張に基づく 音楽プログラミング言語mimium とそのVMの実装
tomoyanonymous
0
400
Weekly AI Agents News! 11月号 プロダクト/ニュースのアーカイブ
masatoto
0
300
20241226_くまもと公共交通新時代シンポジウム
trafficbrain
0
410
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.8k
What's in a price? How to price your products and services
michaelherold
244
12k
Speed Design
sergeychernyshev
27
800
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
Side Projects
sachag
452
42k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
The Pragmatic Product Professional
lauravandoore
32
6.4k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop