Upgrade to Pro — share decks privately, control downloads, hide ads and more …

知っておくと便利な Bloom Filter / Bloom Filter

知っておくと便利な Bloom Filter / Bloom Filter

Yoshiaki Yoshida

October 14, 2016
Tweet

More Decks by Yoshiaki Yoshida

Other Decks in Technology

Transcript

  1. ஌͓ͬͯ͘ͱศརͳ
    Bloom Filter
    2016-10-14
    ࣾ಺ษڧձ
    @kakakakakku

    View Slide

  2. Bloom Filter
    • 1970೥ʹߟҊ͞Εͨ
    • ߟҊऀ Burton Howard Bloom ࢯ
    • ۭؒޮ཰ͷྑ͍֬཰తσʔλߏ଄
    • ཁૉ͕ू߹ͷதʹؚ·ΕΔ͔Λ൑ఆ͢ΔͨΊʹ࢖͏
    • σʔλྔʹґଘͤͣ O(k) ͷܭࢉྔͰߴ଎ʹ൑ఆͰ͖Δ

    View Slide

  3. ू߹
    ؚ·ΕΔ?
    ؚ·Εͳ͍?
    ؚ·ΕΔ?
    ؚ·Εͳ͍?

    View Slide

  4. ׆༻ྫΛ஌Ε͹
    Bloom Filter Λ
    ΋ͬͱ਎ۙʹײ͡ΒΕΔ͸ͣʂ

    View Slide

  5. • Cassandra
    • Key Λݕࡧ͢Δͱ͖ͷ I/O Λ࡟ݮ͢ΔͨΊ

    SSTable ʹ Bloom Filter Λॻ͖ࠐΜͰ͍Δ
    • HBase
    • HFile ʹಛఆͷσʔλؚ͕·Ε͍ͯͳ͍͜ͱΛ

    ݕࡧ͢ΔͨΊʹ Bloom Filter Λ׆༻͍ͯ͠Δ
    Bloom Filter ׆༻ྫ 1

    View Slide

  6. Bloom Filter ׆༻ྫ 2
    • H2O
    • ແବͳαʔόϓογϡΛૹ৴͠ͳ͍ͨΊʹ

    ϒϥ΢βΩϟογϡ৘ใͷ Bloom Filter Λ

    ѹॖͯ͠ HTTP Ͱฦ͍ͯ͠Δ

    CASPER (Cache-Aware Server PushER)
    • Bitcoin
    • τϥϯβΫγϣϯͷݕࡧʹ׆༻͍ͯ͠Δ?

    ʢৄ֬͘͠ೝͰ͖ͯͳ͍ʣ

    View Slide

  7. Bloom Filter ׆༻ྫ 3
    • pixiv
    • ࡞඼ʹ෇͍ͨλάͱ pixiv ඦՊࣄయͷλάू߹ͷ

    ଘࡏ֬ೝʹ׆༻͍ͯ͠Δ?
    http://inside.pixiv.net/entry/2014/07/22/132103

    View Slide

  8. ʘ Bloom Filter ͷڍಈ ʗ

    View Slide

  9. 0 0
    1 0
    2 0
    3 0
    4 0
    5 0
    6 0
    7 0
    8 0
    9 0
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • m bit ͷ഑ྻΛ༻ҙ͢Δ
    • ࠓճ͸ ഑ྻۭؒ = m = 16 ͱ͢Δ
    • શͯ 0 ͰॳظԽ͓ͯ͘͠

    View Slide

  10. 0 0
    1 0
    2 0
    3 0
    4 0
    5 0
    6 0
    7 0
    8 0
    9 0
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • ೚ҙͷϋογϡؔ਺Λ༻ҙ͢Δ
    • ࠓճ͸ k = 2 ݸͷؔ਺Λ࢖͏
    • h1(key) = (key * 1) mod m
    • h2(key) = (key * 2) mod m

    View Slide

  11. 0 1
    1 0
    2 0
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 0
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • key = 1000 Λ௥Ճ͢Δ
    • h1(1000) = 8
    • h2(1000) = 0
    [ 1000 ]

    View Slide

  12. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • key = 1001 Λ௥Ճ͢Δ
    • h1(1001) = 9
    • h2(1001) = 2
    [ 1000, 1001 ]

    View Slide

  13. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • key = 1004 Λ௥Ճ͢Δ
    • h1(1004) = 12
    • h2(1004) = 8
    • h1(1000) = 8 ͱॏෳ͍ͯ͠Δ
    • ϑϥά͸ 1 ͷ··ʹ͢Δ
    [ 1000, 1001, 1004 ]

    View Slide

  14. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • Query : key = 1005 ͸ଘࡏ͢Δ ?
    • h1(1005) = 13
    • h2(1005) = 10
    • h1(1005) = h2(1005) = 0
    • ʮଘࡏ͠ͳ͍ʯͱஅݴͰ͖Δ
    [ 1000, 1001, 1004 ]

    View Slide

  15. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • Query : key = 1000 ͸ଘࡏ͢Δ ?
    • h1(1000) = 8
    • h2(1000) = 0
    • h1(1000) = h2(1000) = 1
    • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍
    • 1000 ͸࣮ࡍʹଘࡏ͢Δ
    [ 1000, 1001, 1004 ]

    View Slide

  16. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • Query : key = 1020 ͸ଘࡏ͢Δ ?
    • h1(1020) = 12
    • h2(1020) = 8
    • h1(1000) = h2(1000) = 1
    • ʮଘࡏ͢Δʯ͔΋͠Εͳ͍
    • 1020 ͸࣮ࡍʹଘࡏ͠ͳ͍
    [ 1000, 1001, 1004 ]

    View Slide

  17. (ƅшƅ) Űō?
    ޡ൑ఆͯ͠Δ͚Ͳ…?

    View Slide

  18. False Positive
    ِཅੑ
    False Negative
    ِӄੑ
    ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱ൑ఆͯ͠͠·͏͜ͱ
    ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱ൑ఆͯ͠͠·͏͜ͱ

    View Slide

  19. False Positive
    ِཅੑ
    False Negative
    ِӄੑ
    ʮଘࡏ͠ͳ͍ʯͱ͖ʹʮଘࡏ͢Δʯͱ൑ఆͯ͠͠·͏͜ͱ
    ʮଘࡏ͢Δʯͱ͖ʹʮଘࡏ͠ͳ͍ʯͱ൑ఆͯ͠͠·͏͜ͱ

    Bloom Filter ʹ͸
    False Positive ͷՄೳੑ͕͋Δ

    View Slide

  20. False Positive ͷՄೳੑ
    • O(k) Ͱߴ଎ʹ൑ఆͰ͖Δ୅ঈͱͯ͠

    False Positive ͷՄೳੑ͕͋Δ
    • Αͬͯ key = 1020 ͷΑ͏ʹ

    ʮଘࡏ͢Δʯͱޡݕ஌ͯ͠͠·͏৔߹͕͋Δ
    • ͨͩ͠ False Negative ͸ 100% ͋Γಘͳ͍

    View Slide

  21. ϝϦοτ
    • ܭࢉྔ O(k)
    • ઢܗ୳ࡧͩͱ O(N)
    • ೋ෼୳ࡧͩͱ O(log N)
    • ϋογϡςʔϒϧͳΒ O(1) ͩ͠΋ͬͱߴ଎?
    • k = 1 ͳΒ Bloom Filter ΋ O(1) ʹͳΔ
    • σʔλΛอ࣋͢Δඞཁ͕ͳۭؒ͘ޮ཰͕ྑ͍

    View Slide

  22. ʘ Bloom Filter ͸ཁૉ͕࡟আͰ͖ͳ͍ ʗ

    View Slide

  23. 0 0
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 0
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • 1000 Λ࡟আ͢Δͱ
    • h1(1005) = 8
    • h2(1005) = 0
    • 1004 ΋࡟আ͞Εͯ͠·͏ !!!
    • h1(1005) = 12
    • h2(1005) = 8
    [ 1000, 1001, 1004 ]

    View Slide

  24. ʘ ཁૉΛ࡟আ͢ΔͳΒ Counting Filter ʗ
    Bloom Filter Λ֦ுͨ͠ΞϧΰϦζϜ

    View Slide

  25. 0 1
    1 0
    2 0
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 0
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • key = 1000 Λ௥Ճ͢Δ
    • h1(1000) = 8
    • h2(1000) = 0
    [ 1000 ]

    View Slide

  26. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 0
    13 0
    14 0
    15 0
    • key = 1001 Λ௥Ճ͢Δ
    • h1(1001) = 9
    • h2(1001) = 2
    [ 1000, 1001 ]

    View Slide

  27. 0 1
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 2
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • key = 1004 Λ௥Ճ͢Δ
    • h1(1004) = 12
    • h2(1004) = 8
    • ॏෳͨ͠ΒΠϯΫϦϝϯτ͢Δ
    [ 1000, 1001, 1004 ]
    ϏοτͰ͸ͳ͘
    Χ΢ϯλʔͰදݱ͢Δ఺͕
    Bloom Filter ͱҟͳΔ

    View Slide

  28. 0 0
    1 0
    2 1
    3 0
    4 0
    5 0
    6 0
    7 0
    8 1
    9 1
    10 0
    11 0
    12 1
    13 0
    14 0
    15 0
    • key = 1000 Λ࡟আ͢Δ
    • h1(1004) = 8
    • h2(1004) = 0
    • σΫϦϝϯτ͢Δ
    [ 1000, 1001, 1004 ]

    View Slide

  29. ʘ False Positive ֬཰ ʗ

    View Slide

  30. Bloom Filter ެࣜͰࢉग़
    m : ഑ྻۭؒ਺ (bit)
    n : ొ࿥ཁૉ਺
    ৄ͘͠͸ Wikipedia ʹʂ
    https://ja.wikipedia.org/wiki/ϒϧʔϜϑΟϧλ
    False Positive Λ࠷খʹ͢Δ
    ࠷దͳϋογϡؔ਺ͷۙࣅ஋
    ࠷దͳ k Λ࢖ͬͨ৔߹ͷ
    False Positive ֬཰

    View Slide

  31. ʘ ৺഑ແ༻ ʗ
    ࠷దͳ k Λ࢖͑͹
    False Positive ΛݶΓͳ͘௿͘Ͱ͖Δ

    View Slide

  32. ʘ ·ͱΊ ʗ
    False Positive ͷՄೳੑ΋͋Δ͠
    ཁૉͷ࡟আ΋Ͱ͖ͳ͍͚Ͳ
    τϨʔυΦϑΛ࠷େݶ׆༻ͯ͠
    ߴ଎ & ۭؒޮ཰ͷྑ͍ॲཧ͕Ͱ͖Δʂ

    View Slide

  33. ʘ ஌͓ͬͯ͘ͱศརͳ Bloom Filter ʗ

    View Slide