Upgrade to Pro — share decks privately, control downloads, hide ads and more …

知っておくと便利な Bloom Filter / Bloom Filter

知っておくと便利な Bloom Filter / Bloom Filter

Yoshiaki Yoshida

October 14, 2016
Tweet

More Decks by Yoshiaki Yoshida

Other Decks in Technology

Transcript

  1. Bloom Filter • 1970೥ʹߟҊ͞Εͨ • ߟҊऀ Burton Howard Bloom ࢯ

    • ۭؒޮ཰ͷྑ͍֬཰తσʔλߏ଄ • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λ൑ఆ͢ΔͨΊʹ࢖͏ • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴ଎ʹ൑ఆͰ͖Δ
  2. • Cassandra • Key Λݕࡧ͢Δͱ͖ͷ I/O Λ࡟ݮ͢ΔͨΊ
 SSTable ʹ Bloom

    Filter Λॻ͖ࠐΜͰ͍Δ • HBase • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ
 ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ Bloom Filter ׆༻ྫ 1
  3. Bloom Filter ׆༻ྫ 2 • H2O • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ
 ϒϥ΢βΩϟογϡ৘ใͷ Bloom

    Filter Λ
 ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ
 CASPER (Cache-Aware Server PushER) • Bitcoin • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ?
 ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ
  4. Bloom Filter ׆༻ྫ 3 • pixiv • ࡞඼ʹ෇͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ


    ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ? http://inside.pixiv.net/entry/2014/07/22/132103
  5. 0 0 1 0 2 0 3 0 4 0

    5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • m bit ͷ഑ྻΛ༻ҙ͢Δ • ࠓճ͸ ഑ྻۭؒ = m = 16 ͱ͢Δ • શͯ 0 ͰॳظԽ͓ͯ͘͠
  6. 0 0 1 0 2 0 3 0 4 0

    5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • ೚ҙͷϋογϡؔ਺Λ༻ҙ͢Δ • ࠓճ͸ k = 2 ݸͷؔ਺Λ࢖͏ • h1(key) = (key * 1) mod m • h2(key) = (key * 2) mod m
  7. 0 1 1 0 2 0 3 0 4 0

    5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 Λ௥Ճ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
  8. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 Λ௥Ճ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
  9. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 Λ௥Ճ͢Δ • h1(1004) = 12 • h2(1004) = 8 • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ • ϑϥά͸ 1 ͷ··ʹ͢Δ [ 1000, 1001, 1004 ]
  10. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1005 ͸ଘࡏ͢Δ ? • h1(1005) = 13 • h2(1005) = 10 • h1(1005) = h2(1005) = 0 • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ [ 1000, 1001, 1004 ]
  11. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1000 ͸ଘࡏ͢Δ ? • h1(1000) = 8 • h2(1000) = 0 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍ • 1000 ͸࣮ࡍʹଘࡏ͢Δ [ 1000, 1001, 1004 ]
  12. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • Query : key = 1020 ͸ଘࡏ͢Δ ? • h1(1020) = 12 • h2(1020) = 8 • h1(1000) = h2(1000) = 1 • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍ • 1020 ͸࣮ࡍʹଘࡏ͠ͳ͍ [ 1000, 1001, 1004 ]
  13. False Positive ͷՄೳੑ • O(k) Ͱߴ଎ʹ൑ఆͰ͖Δ୅ঈͱͯ͠
 False Positive ͷՄೳੑ͕͋Δ •

    Αͬͯ key = 1020 ͷΑ͏ʹ
 ʮଘࡏ͢Δʯͱޡݕ஌ͯ͠͠·͏৔߹͕͋Δ • ͨͩ͠ False Negative ͸ 100% ͋Γಘͳ͍
  14. ϝϦοτ • ܭࢉྔ O(k) • ઢܗ୳ࡧͩͱ O(N) • ೋ෼୳ࡧͩͱ O(log

    N) • ϋογϡςʔϒϧͳΒ O(1) ͩ͠΋ͬͱߴ଎? • k = 1 ͳΒ Bloom Filter ΋ O(1) ʹͳΔ • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ཰͕ྑ͍
  15. 0 0 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 0 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • 1000 Λ࡟আ͢Δͱ • h1(1005) = 8 • h2(1005) = 0 • 1004 ΋࡟আ͞Εͯ͠·͏ !!! • h1(1005) = 12 • h2(1005) = 8 [ 1000, 1001, 1004 ]
  16. 0 1 1 0 2 0 3 0 4 0

    5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1000 Λ௥Ճ͢Δ • h1(1000) = 8 • h2(1000) = 0 [ 1000 ]
  17. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 0 15 0 • key = 1001 Λ௥Ճ͢Δ • h1(1001) = 9 • h2(1001) = 2 [ 1000, 1001 ]
  18. 0 1 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 2 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1004 Λ௥Ճ͢Δ • h1(1004) = 12 • h2(1004) = 8 • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ] ϏοτͰ͸ͳ͘ Χ΢ϯλʔͰදݱ͢Δ఺͕ Bloom Filter ͱҟͳΔ
  19. 0 0 1 0 2 1 3 0 4 0

    5 0 6 0 7 0 8 1 9 1 10 0 11 0 12 1 13 0 14 0 15 0 • key = 1000 Λ࡟আ͢Δ • h1(1004) = 8 • h2(1004) = 0 • σΫϦϝϯτ͢Δ [ 1000, 1001, 1004 ]
  20. Bloom Filter ެࣜͰࢉग़ m : ഑ྻۭؒ਺ (bit) n : ొ࿥ཁૉ਺

    ৄ͘͠͸ Wikipedia ʹʂ https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ False Positive Λ࠷খʹ͢Δ ࠷దͳϋογϡؔ਺ͷۙࣅ஋ ࠷దͳ k Λ࢖ͬͨ৔߹ͷ False Positive ֬཰