Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
知っておくと便利な Bloom Filter / Bloom Filter
Search
Yoshiaki Yoshida
October 14, 2016
Technology
2
6.3k
知っておくと便利な Bloom Filter / Bloom Filter
Blog /
https://kakakakakku.hatenablog.com/entry/2016/10/14/220055
Yoshiaki Yoshida
October 14, 2016
Tweet
Share
More Decks by Yoshiaki Yoshida
See All by Yoshiaki Yoshida
技術ブロガーを育てる!ブログメンタリングで何を教えているのか / Passion for Blog Mentoring
kakakakakku
8
36k
プログラミング初心者に教えるときは「身近な比喩」が重要なのだ! / Metaphor is Important for Beginner Programmer
kakakakakku
2
5.6k
プロジェクトの成功を支える ZenHub と モブプログラミング / ZenHub and Mob Programming
kakakakakku
1
5.7k
楽しく!アウトプットを習慣化しよう / Let's Enjoy Output
kakakakakku
3
6.7k
さぁ!今すぐプロジェクトリーダーに立候補しよう / Be a Project Leader
kakakakakku
3
9k
プロジェクトをリードする技術 (Kyash 社 再演) / Project Leading is Skill for Kyash
kakakakakku
4
2.1k
プロジェクトをリードする技術 / Project Leading is Skill
kakakakakku
43
46k
Mackerel で ECS をどこまでモニタリングできるのか / Monitoring ECS with Mackerel
kakakakakku
0
13k
[2018/01/30] Redash 初心者向けハンズオン / Redash Meetup #0.1
kakakakakku
0
2.3k
Other Decks in Technology
See All in Technology
JuliaTokaiとJuliaLangJaの紹介 for NGK2025S
antimon2
1
110
re:Invent2024 KeynoteのAmazon Q Developer考察
yusukeshimizu
1
150
今から、 今だからこそ始める Terraform で Azure 管理 / Managing Azure with Terraform: The Perfect Time to Start
nnstt1
0
240
商品レコメンドでのexplicit negative feedbackの活用
alpicola
1
350
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
6
54k
.NET 最新アップデート ~ AI とクラウド時代のアプリモダナイゼーション
chack411
0
200
RubyでKubernetesプログラミング
sat
PRO
4
160
AWS re:Invent 2024 re:Cap Taipei (for Developer): New Launches that facilitate Developer Workflow and Continuous Innovation
dwchiang
0
160
iPadOS18でフローティングタブバーを解除してみた
sansantech
PRO
1
140
Docker Desktop で Docker を始めよう
zembutsu
PRO
0
170
深層学習と3Dキャプチャ・3Dモデル生成(土木学会応用力学委員会 応用数理・AIセミナー)
pfn
PRO
0
460
Amazon Route 53, 待ちに待った TLSAレコードのサポート開始
kenichinakamura
0
170
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
44
13k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
570
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
3
180
The World Runs on Bad Software
bkeepers
PRO
66
11k
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
870
Art, The Web, and Tiny UX
lynnandtonic
298
20k
The Cult of Friendly URLs
andyhume
78
6.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
How to Think Like a Performance Engineer
csswizardry
22
1.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.6k
Transcript
͓ͬͯ͘ͱศརͳ Bloom Filter 2016-10-14 ࣾษڧձ @kakakakakku
Bloom Filter • 1970ʹߟҊ͞Εͨ • ߟҊऀ Burton Howard Bloom ࢯ
• ۭؒޮͷྑ͍֬తσʔλߏ • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λఆ͢ΔͨΊʹ͏ • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴʹఆͰ͖Δ
ू߹ ؚ·ΕΔ? ؚ·Εͳ͍? ؚ·ΕΔ? ؚ·Εͳ͍?
׆༻ྫΛΕ Bloom Filter Λ ͬͱۙʹײ͡ΒΕΔͣʂ
• Cassandra • Key Λݕࡧ͢Δͱ͖ͷ I/O Λݮ͢ΔͨΊ SSTable ʹ Bloom
Filter Λॻ͖ࠐΜͰ͍Δ • HBase • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ Bloom Filter ׆༻ྫ 1
Bloom Filter ׆༻ྫ 2 • H2O • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ ϒϥβΩϟογϡใͷ Bloom
Filter Λ ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ CASPER (Cache-Aware Server PushER) • Bitcoin • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ? ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ
Bloom Filter ׆༻ྫ 3 • pixiv • ࡞ʹ͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ
ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ? http://inside.pixiv.net/entry/2014/07/22/132103
ʘ Bloom Filter ͷڍಈ ʗ
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • m bit ͷྻΛ༻ҙ͢Δ • ࠓճ ྻۭؒ = m = 16 ͱ͢Δ • શͯ 0 ͰॳظԽ͓ͯ͘͠
0 0 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • ҙͷϋογϡؔΛ༻ҙ͢Δ • ࠓճ k = 2 ݸͷؔΛ͏ • h1(key) = (key * 1) mod m • h2(key) = (key * 2) mod m
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ • ϑϥά 1 ͷ··ʹ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1005 ଘࡏ͢Δ ? • h1(1005) = 13 • h2(1005) = 10 • h1(1005) = h2(1005) = 0 • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1000 ଘࡏ͢Δ ? • h1(1000) = 8 • h2(1000) = 0 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1000 ࣮ࡍʹଘࡏ͢Δ [ 1000, 1001, 1004 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1020 ଘࡏ͢Δ ? • h1(1020) = 12 • h2(1020) = 8 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔͠Εͳ͍ • 1020 ࣮ࡍʹଘࡏ͠ͳ͍ [ 1000, 1001, 1004 ]
(ƅшƅ) Űō? ޡఆͯ͠Δ͚Ͳ…?
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ
False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱఆͯ͠͠·͏͜ͱ ↑ Bloom
Filter ʹ False Positive ͷՄೳੑ͕͋Δ
False Positive ͷՄೳੑ • O(k) ͰߴʹఆͰ͖Δঈͱͯ͠ False Positive ͷՄೳੑ͕͋Δ •
Αͬͯ key = 1020 ͷΑ͏ʹ ʮଘࡏ͢Δʯͱޡݕͯ͠͠·͏߹͕͋Δ • ͨͩ͠ False Negative 100% ͋Γಘͳ͍
ϝϦοτ • ܭࢉྔ O(k) • ઢܗ୳ࡧͩͱ O(N) • ೋ୳ࡧͩͱ O(log
N) • ϋογϡςʔϒϧͳΒ O(1) ͩͬ͠ͱߴ? • k = 1 ͳΒ Bloom Filter O(1) ʹͳΔ • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ͕ྑ͍
ʘ Bloom Filter ཁૉ͕আͰ͖ͳ͍ ʗ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 0 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • 1000 Λআ͢Δͱ • h1(1005) = 8 • h2(1005) = 0 • 1004 আ͞Εͯ͠·͏ !!! • h1(1005) = 12 • h2(1005) = 8 [ 1000, 1001, 1004 ]
ʘ ཁૉΛআ͢ΔͳΒ Counting Filter ʗ Bloom Filter Λ֦ுͨ͠ΞϧΰϦζϜ
0 1 1 0 2 0 3 0 4 0
5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 ΛՃ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 ΛՃ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
0 1 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 2 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 ΛՃ͢Δ • h1(1004) = 12 • h2(1004) = 8 • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ] ϏοτͰͳ͘ ΧϯλʔͰදݱ͢Δ͕ Bloom Filter ͱҟͳΔ
0 0 1 0 2 1 3 0 4 0
5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1000 Λআ͢Δ • h1(1004) = 8 • h2(1004) = 0 • σΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ]
ʘ False Positive ֬ ʗ
Bloom Filter ެࣜͰࢉग़ m : ྻۭؒ (bit) n : ొཁૉ
ৄ͘͠ Wikipedia ʹʂ https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ False Positive Λ࠷খʹ͢Δ ࠷దͳϋογϡؔͷۙࣅ ࠷దͳ k Λͬͨ߹ͷ False Positive ֬
ʘ ৺ແ༻ ʗ ࠷దͳ k Λ͑ False Positive ΛݶΓͳ͘͘Ͱ͖Δ
ʘ ·ͱΊ ʗ False Positive ͷՄೳੑ͋Δ͠ ཁૉͷআͰ͖ͳ͍͚Ͳ τϨʔυΦϑΛ࠷େݶ׆༻ͯ͠ ߴ &
ۭؒޮͷྑ͍ॲཧ͕Ͱ͖Δʂ
ʘ ͓ͬͯ͘ͱศརͳ Bloom Filter ʗ