Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
1
560
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
Tweet
Share
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
710
RubyとJIT
naoys
0
140
将棋盤を画像認識したかった
naoys
0
1.5k
Rust で乗り換え案内
naoys
0
610
有理数集合の濃度
naoys
2
110
YARVの最適化について調べた
naoys
0
110
転職会議サービスのAWS移行記録
naoys
0
51
Anonymous Recursion in C++
naoys
0
400
入門AlphaGo
naoys
5
3.7k
Other Decks in Programming
See All in Programming
useSyncExternalStoreを使いまくる
ssssota
6
1k
SymfonyCon Vienna 2025: Twig, still relevant in 2025?
fabpot
3
1.2k
StarlingMonkeyを触ってみた話 - 2024冬
syumai
3
270
Semantic Kernelのネイティブプラグインで知識拡張をしてみる
tomokusaba
0
180
CQRS+ES の力を使って効果を感じる / Feel the effects of using the power of CQRS+ES
seike460
PRO
0
120
快速入門可觀測性
blueswen
0
350
MCP with Cloudflare Workers
yusukebe
2
220
なまけものオバケたち -PHP 8.4 に入った新機能の紹介-
tanakahisateru
1
120
LLM Supervised Fine-tuningの理論と実践
datanalyticslabo
4
1.1k
Zoneless Testing
rainerhahnekamp
0
120
複雑な仕様に立ち向かうアーキテクチャ
myohei
0
170
DevFest Tokyo 2025 - Flutter のアプリアーキテクチャ現在地点
wasabeef
5
900
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
810
It's Worth the Effort
3n
183
28k
GraphQLとの向き合い方2022年版
quramy
44
13k
Faster Mobile Websites
deanohume
305
30k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
28
2.1k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Why Our Code Smells
bkeepers
PRO
335
57k
Building Your Own Lightsaber
phodgson
103
6.1k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
66k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html