Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
疎行列と Jaccard 類似度の高速計算
Search
na-o-ys
March 29, 2017
Programming
1
650
疎行列と Jaccard 類似度の高速計算
na-o-ys
March 29, 2017
Tweet
Share
More Decks by na-o-ys
See All by na-o-ys
IoTと監視
naoys
1
810
RubyとJIT
naoys
0
170
将棋盤を画像認識したかった
naoys
0
1.6k
Rust で乗り換え案内
naoys
0
640
有理数集合の濃度
naoys
2
140
YARVの最適化について調べた
naoys
0
150
転職会議サービスのAWS移行記録
naoys
0
79
Anonymous Recursion in C++
naoys
0
430
入門AlphaGo
naoys
5
3.8k
Other Decks in Programming
See All in Programming
[AI Engineering Summit Tokyo 2025] LLMは計画業務のゲームチェンジャーか? 最適化業務における活⽤の可能性と限界
terryu16
2
280
Graviton と Nitro と私
maroon1st
0
160
Pythonではじめるオープンデータ分析〜書籍の紹介と書籍で紹介しきれなかった事例の紹介〜
welliving
3
770
JETLS.jl ─ A New Language Server for Julia
abap34
2
470
CSC307 Lecture 02
javiergs
PRO
1
760
AI Agent の開発と運用を支える Durable Execution #AgentsInProd
izumin5210
0
180
Deno Tunnel を使ってみた話
kamekyame
0
310
チームをチームにするEM
hitode909
0
440
Python札幌 LT資料
t3tra
7
1.1k
PostgreSQLで手軽にDuckDBを使う!DuckDB&pg_duckdb入門/osc25hi-duckdb
takahashiikki
0
240
GoLab2025 Recap
kuro_kurorrr
0
3.5k
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
480
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
140
Designing for Timeless Needs
cassininazir
0
110
Odyssey Design
rkendrick25
PRO
0
460
The SEO identity crisis: Don't let AI make you average
varn
0
47
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
140
How to Think Like a Performance Engineer
csswizardry
28
2.4k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
The agentic SEO stack - context over prompts
schlessera
0
590
Thoughts on Productivity
jonyablonski
74
5k
The Limits of Empathy - UXLibs8
cassininazir
1
200
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
61
48k
Transcript
ૄߦྻ ͋Δ͍ Jaccard ྨࣅΛߴͰܭࢉ͢Δํ๏ @na_o_ys
Agenda 1. ૄߦྻͷσʔλߏ 2. Python ͱܭࢉ 3. Python ͱૄߦྻ 4.
Jaccard ྨࣅ
1. ૄߦྻͷσʔλߏ
ૄߦྻͱ ΄ͱΜͲͷཁૉ͕ 0 Ͱ͋Δߦྻ
1. ૄߦྻͷσʔλߏ (1) • ௨ৗͷߦྻ Array • ૄߦྻΛ Array Ͱѻ͏ͱϝϞϦԋࢉແବ
• 0 ϕΫτϧಉ࢜ͷࢉͱ͔໌Β͔ʹແବ
1. ૄߦྻͷσʔλߏ (2) • Compressed Sparse Row (CSR) • CSR
ಉ࢜ͷՃࢉ, ߦྻੵ͕ߴ • ߦϕΫτϧͷऔΓग़͕͠ߴ • ྻϕΫτϧͷऔΓग़͕͠ • (wikipedia)
2. Python ͱܭࢉ
2. Python ͱܭࢉ • ख़ͨ͠ܭࢉϥΠϒϥϦ • NumPy, SciPy • Scikit-learn
ͱ͜ΖͰɺPython ͍ (DEMO)
Python ͍ • 5000 ഒ ࣮ߦ࣌ؒ 1ZUIPO NT Ұ෦/VN1Z NT
શ෦/VN1Z NT
Python-loop is Evil • ߦྻϧʔϓઈରʹॻ͍͍͚ͯͳ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 2 ͔͔࣌ؒΔ
• ߦϧʔϓ/ྻϧʔϓॻ͔ͳ͍ํ͕ྑ͍ • 1 ඵͰऴΘΔͣͷॲཧʹ 1 ͔͔Δ
3. Python ͱૄߦྻ
3. Python ͱૄߦྻ • scipy.sparse.csr_matrix
ޮతͳߦྻॲཧ • ߦϕΫτϧͷऔΓग़͠ • Ճࢉࢉ, ߦྻੵ • ෦දݱΛ numpy.ndarray ͱͯ͠อ࣋
• औΓग़ͯ͠ૢ࡞Ͱ͖Δ (NumPy ͷੈք Ͱ)
4. Jaccard ྨࣅ
4. Jaccard ྨࣅ • ϕΫτϧಉ࢜ͷྨࣅ • ڠௐϑΟϧλϦϯάͱ͔Ͱ͏ • ϢʔβAͱϢʔβBͲΕ͘Β͍ࣅ͍ͯΔ͔ Jaccard(a,
b) = a・b / (a・a + b・b - a・b)
ࣄͰඞཁʹͳͬͨ͜ͱ • ૄߦྻͷߦϕΫτϧಉ࢜ͷ Jaccard ྨࣅΛ ܭࢉ͍ͨ͠
DEMO
·ͱΊ
·ͱΊ • Python ͍ • ϥΠϒϥϦΛ͏·͘͏ඞཁ͕͋Δ • ϒϩάΛॻ͍ͨ • http://na-o-ys.github.io/others/
2015-11-07-sparse-vector- similarities.html