Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
1
640
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
Tweet
Share
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
800
RubyとJIT
naoys
0
170
将棋盤を画像認識したかった
naoys
0
1.6k
Rust で乗り換え案内
naoys
0
630
有理数集合の濃度
naoys
2
140
YARVの最適化について調べた
naoys
0
140
転職会議サービスのAWS移行記録
naoys
0
73
Anonymous Recursion in C++
naoys
0
430
入門AlphaGo
naoys
5
3.8k
Other Decks in Programming
See All in Programming
しっかり学ぶ java.lang.*
nagise
1
340
AI駆動開発ライフサイクル(AI-DLC)のホワイトペーパーを解説
swxhariu5
0
800
PHPライセンス変更の議論を通じて学ぶOSSライセンスの基礎
matsuo_atsushi
0
140
MCPサーバー「モディフィウス」で変更容易性の向上をスケールする / modifius
minodriven
8
1.4k
Inside of Swift Export
giginet
PRO
1
550
TVerのWeb内製化 - 開発スピードと品質を両立させるまでの道のり
techtver
PRO
1
490
Bakuraku E2E Scenario Test System Architecture #bakuraku_qa_study
teyamagu
PRO
0
730
Web エンジニアが JavaScript で AI Agent を作る / JSConf JP 2025 sponsor session
izumin5210
4
1.4k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
470
「正規表現をつくる」をつくる / make "make regex"
makenowjust
1
390
チーム開発の “地ならし"
konifar
7
4.2k
ビルドプロセスをデバッグしよう!
yt8492
0
310
Featured
See All Featured
Building an army of robots
kneath
306
46k
Context Engineering - Making Every Token Count
addyosmani
9
380
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
The Pragmatic Product Professional
lauravandoore
36
7k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
2.9k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Building Adaptive Systems
keathley
44
2.8k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Mobile First: as difficult as doing things right
swwweet
225
10k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
Scaling GitHub
holman
463
140k
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html