Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
490
1
Share
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
Dual Quadric表現を用いた動的物体追跡とRGB-D・IMU制約の密結合によるオドメトリ推定
nanoshimarobot
0
350
Unified Audio Source Separation (Defense Slides)
kohei_1979
1
590
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
840
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
7
1.7k
それ、チームの改善になってますか?ー「チームとは?」から始めた組織の実験ー
hirakawa51
0
1.1k
LLM Compute Infrastructure Overview
karakurist
2
1.3k
Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering
anatolykr
0
150
明日から使える!研究効率化ツール入門
matsui_528
12
6.8k
業界横断 副業コンプライアンス調査 三者(副業者・本業先・発注者)におけるトラブル認知ギャップの構造分析
fkske
0
1.3k
SoftMatcha 2: 1兆語規模コーパスの超高速かつ柔らかい検索
e869120_sub
6
3.2k
論文紹介 "ReSim: Reliable World Simulation for Autonomous Driving"
kogo
0
500
多様なデータを許容し学習し続ける模倣学習 / Advanced Imitation Learning for VLA
prinlab
0
180
Featured
See All Featured
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
250
Abbi's Birthday
coloredviolet
2
7.4k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.9k
Paper Plane
katiecoart
PRO
1
49k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.4k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
490
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
450
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
10k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
270
Docker and Python
trallard
47
3.8k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
140
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop