Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
350
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
830
Word and Document Embeddings
lmcinnes
0
88
Topological Data Analysis
lmcinnes
1
180
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.2k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
1.9k
Other Decks in Research
See All in Research
Alexander Mielke Hellinger--Kantorovich (a.k.a. Wasserstein-Fisher-Rao) Spaces and Gradient Flows
jjzhu
3
180
My Journey as a UX Researcher
aranciap
0
1.1k
第4回ナレッジグラフ勉強会:ISWC2023論文読み会
kg_wakate
1
200
オープンな日本語埋め込みモデルの選択肢 / Exploring Publicly Available Japanese Embedding Models
nttcom
14
5.4k
Source Code Diff Revolution (JetBrains Open Reading Club)
tsantalis
0
260
[ICLR'24] Towards Assessing and Benchmarking Risk-Return Tradeoff of OPE
harukakiyohara_
0
200
F0に基づいて伸縮された画像文字からの音声合成 [ASJ2024春]
nehi0615
0
120
How to Perform Manual Classification for Deep Learning Using CloudCompare
kentaitakura
0
640
Generative Spoken Dialogue Language Modeling [対話論文読み会@電通大]
yuta0306
1
130
People Driven Transformation / 人が起点の、社会の変え方
dmattsun
0
150
眠眠ガチャ:ガチャを活用した睡眠意欲向上アプリの開発 / EC71inui
yumulab
0
150
Refactoring Mining - The key to unlock software evolution
tsantalis
0
250
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
325
20k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
322
20k
Reflections from 52 weeks, 52 projects
jeffersonlam
345
19k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
226
51k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
121
39k
A Philosophy of Restraint
colly
197
16k
Into the Great Unknown - MozCon
thekraken
10
990
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
2
1.3k
A better future with KSS
kneath
231
16k
Raft: Consensus for Rubyists
vanstee
132
6.3k
The Mythical Team-Month
searls
216
42k
Creatively Recalculating Your Daily Design Routine
revolveconf
210
11k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop