Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
660
1
Share
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
820
RubyとJIT
naoys
0
180
将棋盤を画像認識したかった
naoys
0
1.6k
Rust で乗り換え案内
naoys
0
640
有理数集合の濃度
naoys
2
150
YARVの最適化について調べた
naoys
0
160
転職会議サービスのAWS移行記録
naoys
0
88
Anonymous Recursion in C++
naoys
0
440
入門AlphaGo
naoys
5
3.8k
Other Decks in Programming
See All in Programming
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
4
2.2k
安いハードウェアでVulkan
fadis
1
850
PHP でエミュレータを自作して Ubuntu を動かそう
m3m0r7
PRO
2
150
今こそ押さえておきたい アマゾンウェブサービス(AWS)の データベースの基礎 おもクラ #6版
satoshi256kbyte
1
210
条件判定に名前、つけてますか? #phperkaigi #c
77web
2
890
Understanding Apache Lucene - More than just full-text search
spinscale
0
150
GC言語のWasm化とComponent Modelサポートの実践と課題 - Scalaの場合
tanishiking
0
130
Rethinking API Platform Filters
vinceamstoutz
0
4.2k
AI活用のコスパを最大化する方法
ochtum
0
360
存在論的プログラミング: 時間と存在を記述する
koriym
5
680
Codex の「自走力」を高める
yorifuji
0
1.3k
Tamach-sre-3_ANDPAD-shimaison93
mane12yurks38
0
210
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
463
34k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.2k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Code Reviewing Like a Champion
maltzj
528
40k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
97
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
780
YesSQL, Process and Tooling at Scale
rocio
174
15k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
210
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
490
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html