Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
知っておくと便利な Bloom Filter / Bloom Filter
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Yoshiaki Yoshida
October 14, 2016
Technology
2
6.8k
知っておくと便利な Bloom Filter / Bloom Filter
Blog /
https://kakakakakku.hatenablog.com/entry/2016/10/14/220055
Yoshiaki Yoshida
October 14, 2016
Tweet
Share
More Decks by Yoshiaki Yoshida
See All by Yoshiaki Yoshida
技術ブロガーを育てる!ブログメンタリングで何を教えているのか / Passion for Blog Mentoring
kakakakakku
8
38k
プログラミング初心者に教えるときは「身近な比喩」が重要なのだ! / Metaphor is Important for Beginner Programmer
kakakakakku
2
5.7k
プロジェクトの成功を支える ZenHub と モブプログラミング / ZenHub and Mob Programming
kakakakakku
1
5.9k
楽しく!アウトプットを習慣化しよう / Let's Enjoy Output
kakakakakku
3
7k
さぁ!今すぐプロジェクトリーダーに立候補しよう / Be a Project Leader
kakakakakku
3
10k
プロジェクトをリードする技術 (Kyash 社 再演) / Project Leading is Skill for Kyash
kakakakakku
4
2.2k
プロジェクトをリードする技術 / Project Leading is Skill
kakakakakku
45
52k
Mackerel で ECS をどこまでモニタリングできるのか / Monitoring ECS with Mackerel
kakakakakku
0
14k
[2018/01/30] Redash 初心者向けハンズオン / Redash Meetup #0.1
kakakakakku
0
2.5k
Other Decks in Technology
See All in Technology
Master Dataグループ紹介資料
sansan33
PRO
1
4.4k
Eight Engineering Unit 紹介資料
sansan33
PRO
1
6.9k
ブラックボックス観測に基づくAI支援のプロトコルのリバースエンジニアリングと再現~AIを用いたリバースエンジニアリング~ @ SECCON 14 電脳会議 / Reverse Engineering and Reproduction of an AI-Assisted Protocol Based on Black-Box Observation @ SECCON 14 DENNO-KAIGI
chibiegg
0
130
AWS Bedrock Guardrails / 機密情報の入力・出力をブロックする — Blocking Sensitive Information Input/Output
kazuhitonakayama
2
190
生成AI活用によるPRレビュー改善の歩み
lycorptech_jp
PRO
4
2k
型を書かないRuby開発への挑戦
riseshia
0
140
作るべきものと向き合う - ecspresso 8年間の開発史から学ぶ技術選定 / 技術選定con findy 2026
fujiwara3
6
1.8k
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
180
社内でAWS BuilderCards体験会を立ち上げ、得られた気づき / 20260225 Masaki Okuda
shift_evolve
PRO
1
160
「使いにくい」も「運用疲れ」も卒業する UIデザイナーとエンジニアが創る持続可能な内製開発
nrinetcom
PRO
1
770
類似画像検索モデルの開発ノウハウ
lycorptech_jp
PRO
0
410
AI が Approve する開発フロー / How AI Reviewers Accelerate Our Development
zaimy
1
260
Featured
See All Featured
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
620
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
Fireside Chat
paigeccino
42
3.8k
Done Done
chrislema
186
16k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
RailsConf 2023
tenderlove
30
1.4k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
760
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.7k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
The Curse of the Amulet
leimatthew05
1
9.5k
The Invisible Side of Design
smashingmag
302
51k
Transcript
͓ͬͯ͘ͱศརͳ Bloom Filter 2016-10-14 ࣾษڧձ @kakakakakku
Bloom Filter • 1970ʹߟҊ͞Εͨ • ߟҊऀ Burton Howard Bloom ࢯ
• ۭؒޮͷྑ͍֬తσʔλߏ • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λఆ͢ΔͨΊʹ͏ • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴʹఆͰ͖Δ
ू߹ ؚ·ΕΔ? ؚ·Εͳ͍? ؚ·ΕΔ? ؚ·Εͳ͍?
׆༻ྫΛΕ Bloom Filter Λ ͬͱۙʹײ͡ΒΕΔͣʂ
• Cassandra • Key Λݕࡧ͢Δͱ͖ͷ I/O Λݮ͢ΔͨΊ SSTable ʹ Bloom
Filter Λॻ͖ࠐΜͰ͍Δ • HBase • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ Bloom Filter ׆༻ྫ 1
Bloom Filter ׆༻ྫ 2 • H2O • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ ϒϥβΩϟογϡใͷ Bloom
Filter Λ ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ CASPER (Cache-Aware Server PushER) • Bitcoin • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ? ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ
Bloom Filter ׆༻ྫ 3 • pixiv • ࡞ʹ͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ
ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ? http://inside.pixiv.net/entry/2014/07/22/132103
ʘ Bloom Filter ͷڍಈ ʗ
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • m bit ͷྻΛ༻ҙ͢Δ • ࠓճ ྻۭؒ = m = 16 ͱ͢Δ • શͯ 0 ͰॳظԽ͓ͯ͘͠
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • ҙͷϋογϡؔΛ༻ҙ͢Δ • ࠓճ k = 2 ݸͷؔΛ͏ • h1(key) = (key * 1) mod m • h2(key) = (key * 2) mod m
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ • ϑϥά 1 ͷ··ʹ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1005 ଘࡏ͢Δ ? • h1(1005) = 13 • h2(1005) = 10 • h1(1005) = h2(1005) = 0 • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1000 ଘࡏ͢Δ ? • h1(1000) = 8 • h2(1000) = 0 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1000 ࣮ࡍʹଘࡏ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1020 ଘࡏ͢Δ ? • h1(1020) = 12 • h2(1020) = 8 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1020 ࣮ࡍʹଘࡏ͠ͳ͍ [ 1000, 1001, 1004 ]
(ƅшƅ) Űō? ޡఆͯ͠Δ͚Ͳ…?
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ ↑ Bloom
Filter ʹ False Positive ͷՄೳੑ͕͋Δ
False Positive ͷՄೳੑ • O(k) ͰߴʹఆͰ͖Δঈͱͯ͠ False Positive ͷՄೳੑ͕͋Δ •
Αͬͯ key = 1020 ͷΑ͏ʹ ʮଘࡏ͢Δʯͱޡݕͯ͠͠·͏߹͕͋Δ • ͨͩ͠ False Negative 100% ͋Γಘͳ͍
ϝϦοτ • ܭࢉྔ O(k) • ઢܗ୳ࡧͩͱ O(N) • ೋ୳ࡧͩͱ O(log
N) • ϋογϡςʔϒϧͳΒ O(1) ͩͬ͠ͱߴ? • k = 1 ͳΒ Bloom Filter O(1) ʹͳΔ • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ͕ྑ͍
ʘ Bloom Filter ཁૉ͕আͰ͖ͳ͍ ʗ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 0 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • 1000 Λআ͢Δͱ • h1(1005) = 8 • h2(1005) = 0 • 1004 আ͞Εͯ͠·͏ !!! • h1(1005) = 12 • h2(1005) = 8 [ 1000, 1001, 1004 ]
ʘ ཁૉΛআ͢ΔͳΒ Counting Filter ʗ Bloom Filter Λ֦ுͨ͠ΞϧΰϦζϜ
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 2 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ] ϏοτͰͳ͘ ΧϯλʔͰදݱ͢Δ͕ Bloom Filter ͱҟͳΔ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1000 Λআ͢Δ • h1(1004) = 8 • h2(1004) = 0 • σΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ]
ʘ False Positive ֬ ʗ
Bloom Filter ެࣜͰࢉग़ m : ྻۭؒ (bit) n : ొཁૉ
ৄ͘͠ Wikipedia ʹʂ https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ False Positive Λ࠷খʹ͢Δ ࠷దͳϋογϡؔͷۙࣅ ࠷దͳ k Λͬͨ߹ͷ False Positive ֬
ʘ ৺ແ༻ ʗ ࠷దͳ k Λ͑ False Positive ΛݶΓͳ͘͘Ͱ͖Δ
ʘ ·ͱΊ ʗ False Positive ͷՄೳੑ͋Δ͠ ཁૉͷআͰ͖ͳ͍͚Ͳ τϨʔυΦϑΛ࠷େݶ׆༻ͯ͠ ߴ &
ۭؒޮͷྑ͍ॲཧ͕Ͱ͖Δʂ
ʘ ͓ͬͯ͘ͱศརͳ Bloom Filter ʗ