Slide 1

Slide 1 text

஌͓ͬͯ͘ͱศརͳ Bloom Filter 2016-10-14 ࣾ಺ษڧձ @kakakakakku

Slide 2

Slide 2 text

Bloom Filter • 1970೥ʹߟҊ͞Εͨ • ߟҊऀ Burton Howard Bloom ࢯ • ۭؒޮ཰ͷྑ͍֬཰తσʔλߏ଄ • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λ൑ఆ͢ΔͨΊʹ࢖͏ • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴ଎ʹ൑ఆͰ͖Δ

Slide 3

Slide 3 text

ू߹ ؚ·ΕΔ? ؚ·Εͳ͍? ؚ·ΕΔ? ؚ·Εͳ͍?

Slide 4

Slide 4 text

׆༻ྫΛ஌Ε͹ Bloom Filter Λ ΋ͬͱ਎ۙʹײ͡ΒΕΔ͸ͣʂ

Slide 5

Slide 5 text

• Cassandra • Key Λݕࡧ͢Δͱ͖ͷ I/O Λ࡟ݮ͢ΔͨΊ
 SSTable ʹ Bloom Filter Λॻ͖ࠐΜͰ͍Δ • HBase • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ
 ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ Bloom Filter ׆༻ྫ 1

Slide 6

Slide 6 text

Bloom Filter ׆༻ྫ 2 • H2O • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ
 ϒϥ΢βΩϟογϡ৘ใͷ Bloom Filter Λ
 ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ
 CASPER (Cache-Aware Server PushER) • Bitcoin • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ?
 ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ

Slide 7

Slide 7 text

Bloom Filter ׆༻ྫ 3 • pixiv • ࡞඼ʹ෇͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ
 ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ? http://inside.pixiv.net/entry/2014/07/22/132103

Slide 8

Slide 8 text

ʘ Bloom Filter ͷڍಈ ʗ

Slide 9

Slide 9 text

0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • m bit ͷ഑ྻΛ༻ҙ͢Δ • ࠓճ͸ ഑ྻۭؒ = m = 16 ͱ͢Δ • શͯ 0 ͰॳظԽ͓ͯ͘͠

Slide 10

Slide 10 text

0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • ೚ҙͷϋογϡؔ਺Λ༻ҙ͢Δ • ࠓճ͸ k = 2 ݸͷؔ਺Λ࢖͏ • h1(key) = (key * 1) mod m • h2(key) = (key * 2) mod m

Slide 11

Slide 11 text

0 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 Λ௥Ճ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]

Slide 12

Slide 12 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 Λ௥Ճ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]

Slide 13

Slide 13 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 Λ௥Ճ͢Δ • h1(1004) = 12 • h2(1004) = 8 • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ • ϑϥά͸ 1 ͷ··ʹ͢Δ [ 1000, 1001, 1004 ]

Slide 14

Slide 14 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1005 ͸ଘࡏ͢Δ ? • h1(1005) = 13 • h2(1005) = 10 • h1(1005) = h2(1005) = 0 • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ [ 1000, 1001, 1004 ]

Slide 15

Slide 15 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1000 ͸ଘࡏ͢Δ ? • h1(1000) = 8 • h2(1000) = 0 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍ • 1000 ͸࣮ࡍʹଘࡏ͢Δ [ 1000, 1001, 1004 ]

Slide 16

Slide 16 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1020 ͸ଘࡏ͢Δ ? • h1(1020) = 12 • h2(1020) = 8 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍ • 1020 ͸࣮ࡍʹଘࡏ͠ͳ͍ [ 1000, 1001, 1004 ]

Slide 17

Slide 17 text

(ƅшƅ) Űō? ޡ൑ఆͯ͠Δ͚Ͳ…?

Slide 18

Slide 18 text

False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱ൑ఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱ൑ఆͯ͠͠·͏͜ͱ

Slide 19

Slide 19 text

False Positive ِཅੑ False Negative ِӄੑ ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱ൑ఆͯ͠͠·͏͜ͱ ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱ൑ఆͯ͠͠·͏͜ͱ ↑ Bloom Filter ʹ͸ False Positive ͷՄೳੑ͕͋Δ

Slide 20

Slide 20 text

False Positive ͷՄೳੑ • O(k) Ͱߴ଎ʹ൑ఆͰ͖Δ୅ঈͱͯ͠
 False Positive ͷՄೳੑ͕͋Δ • Αͬͯ key = 1020 ͷΑ͏ʹ
 ʮଘࡏ͢Δʯͱޡݕ஌ͯ͠͠·͏৔߹͕͋Δ • ͨͩ͠ False Negative ͸ 100% ͋Γಘͳ͍

Slide 21

Slide 21 text

ϝϦοτ • ܭࢉྔ O(k) • ઢܗ୳ࡧͩͱ O(N) • ೋ෼୳ࡧͩͱ O(log N) • ϋογϡςʔϒϧͳΒ O(1) ͩ͠΋ͬͱߴ଎? • k = 1 ͳΒ Bloom Filter ΋ O(1) ʹͳΔ • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ཰͕ྑ͍

Slide 22

Slide 22 text

ʘ Bloom Filter ͸ཁૉ͕࡟আͰ͖ͳ͍ ʗ

Slide 23

Slide 23 text

0 0 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 0 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • 1000 Λ࡟আ͢Δͱ • h1(1005) = 8 • h2(1005) = 0 • 1004 ΋࡟আ͞Εͯ͠·͏ !!! • h1(1005) = 12 • h2(1005) = 8 [ 1000, 1001, 1004 ]

Slide 24

Slide 24 text

ʘ ཁૉΛ࡟আ͢ΔͳΒ Counting Filter ʗ Bloom Filter Λ֦ுͨ͠ΞϧΰϦζϜ

Slide 25

Slide 25 text

0 1 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 Λ௥Ճ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]

Slide 26

Slide 26 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 Λ௥Ճ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]

Slide 27

Slide 27 text

0 1 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 2 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 Λ௥Ճ͢Δ • h1(1004) = 12 • h2(1004) = 8 • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ] ϏοτͰ͸ͳ͘ Χ΢ϯλʔͰදݱ͢Δ఺͕ Bloom Filter ͱҟͳΔ

Slide 28

Slide 28 text

0 0 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1000 Λ࡟আ͢Δ • h1(1004) = 8 • h2(1004) = 0 • σΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ]

Slide 29

Slide 29 text

ʘ False Positive ֬཰ ʗ

Slide 30

Slide 30 text

Bloom Filter ެࣜͰࢉग़ m : ഑ྻۭؒ਺ (bit) n : ొ࿥ཁૉ਺ ৄ͘͠͸ Wikipedia ʹʂ https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ False Positive Λ࠷খʹ͢Δ ࠷దͳϋογϡؔ਺ͷۙࣅ஋ ࠷దͳ k Λ࢖ͬͨ৔߹ͷ False Positive ֬཰

Slide 31

Slide 31 text

ʘ ৺഑ແ༻ ʗ ࠷దͳ k Λ࢖͑͹ False Positive ΛݶΓͳ͘௿͘Ͱ͖Δ

Slide 32

Slide 32 text

ʘ ·ͱΊ ʗ False Positive ͷՄೳੑ΋͋Δ͠ ཁૉͷ࡟আ΋Ͱ͖ͳ͍͚Ͳ τϨʔυΦϑΛ࠷େݶ׆༻ͯ͠ ߴ଎ & ۭؒޮ཰ͷྑ͍ॲཧ͕Ͱ͖Δʂ

Slide 33

Slide 33 text

ʘ ஌͓ͬͯ͘ͱศརͳ Bloom Filter ʗ