A Look Into Bloom Filters

A Look Into Bloom Filters

6497e10d8345ce6fee06048127196d6b?s=128

Fernando Mendes

October 07, 2016
Tweet

Transcript

  1. 10.

    “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  2. 16.
  3. 33.

    “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  4. 34.

    “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  5. 35.

    “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  6. 68.

    $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. Tested values: 2048. Positive tests: 1816. False positives: 916.
  7. 69.

    $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. Tested values: 2048. Positive tests: 1816. False positives: 916.
  8. 70.

    $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. * 3 = 2700 Tested values: 2048. Positive tests: 1816. False positives: 916.
  9. 71.

    $ ruby benchmark_v2.rb ### V1 Bloom filter size: 1024. Inserted

    values: 300. Tested values: 2048. Positive tests: 729. False positives: 429. ### V2 Bloom filter size: 1024. Inserted values: 300. Tested values: 2048. Positive tests: 627. False positives: 327.
  10. 72.

    $ ruby benchmark_v2.rb ### V1 Bloom filter size: 1024. Inserted

    values: 300. Tested values: 2048. Positive tests: 729. False positives: 429. ### V2 Bloom filter size: 1024. Inserted values: 300. Tested values: 2048. Positive tests: 627. False positives: 327.
  11. 75.

    calculating the optimal size & number of hash functions is

    a solved problem Things to consider:
  12. 76.

    calculating the optimal size & number of hash functions is

    a solved problem • false positive rate • expected number of items Things to consider:
  13. 79.
  14. 80.
  15. 81.
  16. 84.