Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
1
380
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
Tweet
Share
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
870
Word and Document Embeddings
lmcinnes
0
97
Topological Data Analysis
lmcinnes
1
220
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.3k
A Guide to Dimension Reduction
lmcinnes
3
1.2k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.1k
Other Decks in Research
See All in Research
SSII2024 [OS2] 画像、その先へ 〜モーション解析への誘い〜
ssii
PRO
1
1.2k
いしかわ暮らしセミナー~移住にまつわるお金の話~
matyuda
0
120
多様かつ継続的に変化する環境に適応する情報システム/thesis-defense-presentation
monochromegane
1
370
Weekly AI Agents News! 8月号 プロダクト/ニュースのアーカイブ
masatoto
1
160
MIRU2024_招待講演_RALF_in_CVPR2024
udonda
1
320
日本語医療LLM評価ベンチマークの構築と性能分析
fta98
3
460
大規模言語モデルを用いた日本語視覚言語モデルの評価方法とベースラインモデルの提案 【MIRU 2024】
kentosasaki
2
450
RCEへの近道
kawakatz
1
770
ニューラルネットワークの損失地形
joisino
PRO
31
13k
Weekly AI Agents News! 6月号 プロダクト/ニュースのアーカイブ
masatoto
0
130
20240725異文化融合研究セミナーiSeminar
tadook
0
130
第28回 著者ゼミ:Identification of drug responsible glycogene signature in liver carcinoma from meta-analysis using RNA-seq data
ktatsuya
2
240
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
786
250k
Making Projects Easy
brettharned
114
5.8k
Side Projects
sachag
452
42k
Music & Morning Musume
bryan
46
6.1k
Raft: Consensus for Rubyists
vanstee
136
6.6k
The Language of Interfaces
destraynor
154
24k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
Designing Experiences People Love
moore
138
23k
Building a Scalable Design System with Sketch
lauravandoore
459
32k
10 Git Anti Patterns You Should be Aware of
lemiorhan
653
59k
How to train your dragon (web standard)
notwaldorf
87
5.6k
A Modern Web Designer's Workflow
chriscoyier
692
190k
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop