Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
1
470
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
Tweet
Share
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
610
RubyとJIT
naoys
0
130
将棋盤を画像認識したかった
naoys
0
1.4k
Rust で乗り換え案内
naoys
0
590
有理数集合の濃度
naoys
2
95
YARVの最適化について調べた
naoys
0
100
転職会議サービスのAWS移行記録
naoys
0
30
Anonymous Recursion in C++
naoys
0
390
入門AlphaGo
naoys
5
3.6k
Other Decks in Programming
See All in Programming
Behind VS Code Extensions for JavaScript / TypeScript Linnting and Formatting
unvalley
5
900
Compose-View Interop in Practice (mDevCamp 2024)
stewemetal
0
120
はてなにおける CSS Modules、及び CSS Modules に足りないもの / CSS Modules in Hatena, and CSS Modules missing parts
mizdra
7
920
Goのエラースタックトレースの歴史と今後
sonatard
7
1.2k
FigmaとPHPで作る1ミリたりとも表示崩れしない最強の帳票印刷ソリューション
ttskch
43
19k
使ってみよう Azure AI Document Intelligence
kosmosebi
2
300
Fragment Composition of GraphQL
quramy
4
730
"config" ってなんだ? / What is "config"?
okashoi
0
240
[技育CAMPアカデミア]アイディアを形に!【超入門】スマホアプリ開発〜リリースまでの流れをご紹介
teamlab
PRO
0
360
Netty Chicago Java User Group 2024-04-17
sullis
0
170
GitHub Actionsで泣かないためにやっておきたい設定 / Recommended GHA settings to avoid crying
pinkumohikan
3
530
デフォルトにして至高、RubyMineの大好きな所
ruzia
0
290
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
689
190k
Product Roadmaps are Hard
iamctodd
44
9.7k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
17
1.4k
KATA
mclloyd
15
12k
Music & Morning Musume
bryan
41
5.6k
YesSQL, Process and Tooling at Scale
rocio
164
13k
Fantastic passwords and where to find them - at NoRuKo
philnash
37
2.5k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
7
1k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
60
14k
Rebuilding a faster, lazier Slack
samanthasiow
73
8.2k
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
Become a Pro
speakerdeck
PRO
11
4.5k
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html