Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
370
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
850
Word and Document Embeddings
lmcinnes
0
92
Topological Data Analysis
lmcinnes
1
200
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.3k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2k
Other Decks in Research
See All in Research
自動運転・AIシステムの問題を賢く探す・賢く直す / Smart Search & Repair Techniques for Automated Driving Systems and AI Systems
ishikawafyu
0
140
Minimum Bayes-Risk Decoding における性能変動の理解に向けて(2024年6月5日 第59回 NLPコロキウム)
atsumoto
0
210
SSII2024 [OS1] 画像認識におけるモデル・データの共進化
ssii
PRO
0
380
第60回名古屋CV・PRML勉強会:CVPR2024論文紹介(AM-RADIO)
naok615
0
120
機械学習を用いたポケモン対戦選出予測
fufufukakaka
1
570
Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process
monochromegane
0
100
「Goトレ」のご紹介
smartfukushilab1
0
210
研究効率化Tips_2024 / Research Efficiency Tips 2024
ryo_nakamura
5
4.1k
動物倫理学ことはじめ:人間以外の動物との倫理的な付き合い方を考える
takeshit_m
0
350
LINEチャットボット「全力肯定彼氏くん(LuC4)」の 1年を振り返る
o_ob
0
680
「確率的なオウム」にできること、またそれがなぜできるのかについて
eumesy
PRO
7
2.5k
初めての研究発表を成功させよう! スライド作成の基本
ayaco0
10
4.1k
Featured
See All Featured
Creatively Recalculating Your Daily Design Routine
revolveconf
214
11k
What’s in a name? Adding method to the madness
productmarketing
PRO
21
2.9k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
35
6.3k
The Invisible Customer
myddelton
117
13k
Building Your Own Lightsaber
phodgson
101
5.9k
Optimizing for Happiness
mojombo
373
69k
Git: the NoSQL Database
bkeepers
PRO
423
64k
The Power of CSS Pseudo Elements
geoffreycrofte
64
5.2k
KATA
mclloyd
20
13k
Code Review Best Practice
trishagee
58
16k
The Mythical Team-Month
searls
217
43k
Ruby is Unlike a Banana
tanoku
96
10k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop