Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
400
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
880
Word and Document Embeddings
lmcinnes
0
99
Topological Data Analysis
lmcinnes
1
240
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.3k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.2k
Other Decks in Research
See All in Research
外積やロドリゲスの回転公式を利用した点群の回転
kentaitakura
1
630
湯村研究室の紹介2024 / yumulab2024
yumulab
0
260
DevGPT: Studying Developer-ChatGPT Conversations
taoxiaomark
0
130
Weekly AI Agents News!
masatoto
25
23k
LiDARとカメラのセンサーフュージョンによる点群からのノイズ除去
kentaitakura
0
120
[第62回NLPコロキウム]「なりきり」を促すHCI設計:対話型接客ロボットの遠隔操作者へのリアルタイム変換音声フィードバックの適用
nami_ogawa
0
310
最近のVisual Odometryと Depth Estimation
sgk
1
270
ICLR2024: Reading "Training Unbiased Diffusion Models From Biased Dataset"
hotekagi
0
110
文献紹介:A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
a1da4
1
220
Human-Informed Machine Learning Models and Interactions
hiromu1996
2
420
12
0325
0
180
精度を無視しない推薦多様化の評価指標
kuri8ive
1
220
Featured
See All Featured
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
RailsConf 2023
tenderlove
29
890
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
231
17k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Optimizing for Happiness
mojombo
376
69k
Navigating Team Friction
lara
183
14k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
15
2k
What's in a price? How to price your products and services
michaelherold
243
12k
Facilitating Awesome Meetings
lara
49
6.1k
How to Ace a Technical Interview
jacobian
276
23k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop