Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ensemble Topic Modelling
Search
Leland McInnes
July 12, 2019
Research
500
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ensemble Topic Modelling
A short lightning talk on ensemble topic modelling with pLSA using the enstop package.
Leland McInnes
July 12, 2019
More Decks by Leland McInnes
See All by Leland McInnes
PyNNDescent: Fast Approximate Nearest Neighbors with Numba
lmcinnes
0
1k
Word and Document Embeddings
lmcinnes
0
170
Topological Data Analysis
lmcinnes
1
360
Learning Topology: topological methods for unsupervised learning
lmcinnes
2
3.6k
A Guide to Dimension Reduction
lmcinnes
3
1.4k
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
lmcinnes
2
2.7k
Other Decks in Research
See All in Research
Scalable dynamic origin-destination demand estimation enhanced by high-resolution satellite imagery data
satai
3
280
LLM Compute Infrastructure Overview
karakurist
2
1.4k
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
8
2.1k
「行ける・行けない表」による地域公共交通の性能評価
bansousha
0
160
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent
satai
2
310
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
960
Using our influence and power for patient safety
helenbevan
0
360
Ankylosing Spondylitis
ankh2054
0
170
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
1
1.3k
長時間動画QAにおけるマルチエージェント推論 ・SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
murakawatakuya
1
130
LINEヤフー データサイエンス Meetup「三井物産コモディティ予測チャレンジ」の舞台裏-AlpacaTechパート
gamella
1
570
R&Dチームを起ち上げる
shibuiwilliam
1
270
Featured
See All Featured
Site-Speed That Sticks
csswizardry
13
1.2k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
Utilizing Notion as your number one productivity tool
mfonobong
4
320
Abbi's Birthday
coloredviolet
2
8.1k
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
Fireside Chat
paigeccino
42
4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.5k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
180
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Ruling the World: When Life Gets Gamed
codingconduct
0
260
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
240
Transcript
Ensemble Topic Modelling Leland McInnes
Model a corpus of documents in terms of underlying “topics”
Topic Modelling as Matrix Factorization
None
None
None
None
LDA and pLSA are probabilistic matrix factorization methods
(Ensembles of) pLSA
Performance?
None
Quality?
None
Instability?
These are hard optimization problems
Topics vary from one run to another
What are the stable topics? Inspired by https://github.com/RaRe-Technologies/gensim/pull/2282
None
Each cluster represents a stable topic
None
• Greater stability • Determines number of topics automatically •
Embarrassingly parallel computation
Implementation
sklearn API
None
https://github.com/lmcinnes/enstop
pip install enstop