Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
1
630
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
Tweet
Share
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
790
RubyとJIT
naoys
0
160
将棋盤を画像認識したかった
naoys
0
1.5k
Rust で乗り換え案内
naoys
0
630
有理数集合の濃度
naoys
2
130
YARVの最適化について調べた
naoys
0
140
転職会議サービスのAWS移行記録
naoys
0
72
Anonymous Recursion in C++
naoys
0
420
入門AlphaGo
naoys
5
3.8k
Other Decks in Programming
See All in Programming
階層構造を表現するデータ構造とリファクタリング 〜1年で10倍成長したプロダクトの変化と課題〜
yuhisatoxxx
3
810
CSC305 Lecture 02
javiergs
PRO
1
260
Current States of Java Web Frameworks at JCConf 2025
kishida
0
510
2分台で1500examples完走!爆速CIを支える環境構築術 - Kaigi on Rails 2025
falcon8823
3
2.4k
Learn CPU architecture with Assembly
akkeylab
1
1.3k
Swift Concurrency - 状態監視の罠
objectiveaudio
2
290
AccessorySetupKitで実現するシームレスなペアリング体験 / Seamless pairing with AccessorySetupKit
nekowen
0
210
iOSDC.pdf
chronos2500
2
640
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
2.6k
CSC305 Lecture 01
javiergs
PRO
1
380
dynamic!
moro
9
4.5k
パフォーマンスチューニングで Web 技術を深掘り直す
progfay
18
4.8k
Featured
See All Featured
Docker and Python
trallard
46
3.6k
The Invisible Side of Design
smashingmag
301
51k
Unsuck your backbone
ammeep
671
58k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
560
Reflections from 52 weeks, 52 projects
jeffersonlam
352
21k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
51k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
The Cost Of JavaScript in 2023
addyosmani
53
9k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.5k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
How to Ace a Technical Interview
jacobian
280
23k
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html