Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Look Into Bloom Filters

A Look Into Bloom Filters

Fernando Mendes

October 07, 2016
Tweet

More Decks by Fernando Mendes

Other Decks in Programming

Transcript

  1. “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  2. “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  3. “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  4. “A bloom filter is a space-efficient probabilistic data structure, conceived

    by Burton Howard Bloom in 1970 (…) a query returns either possibly in set or definitely not in set.” - Wikipedia, 2016
  5. $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. Tested values: 2048. Positive tests: 1816. False positives: 916.
  6. $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. Tested values: 2048. Positive tests: 1816. False positives: 916.
  7. $ ruby benchmark.rb ### V1 Bloom filter size: 1024. Inserted

    values: 900. Tested values: 2048. Positive tests: 1532. False positives: 632. ### V2 Bloom filter size: 1024. Inserted values: 900. * 3 = 2700 Tested values: 2048. Positive tests: 1816. False positives: 916.
  8. $ ruby benchmark_v2.rb ### V1 Bloom filter size: 1024. Inserted

    values: 300. Tested values: 2048. Positive tests: 729. False positives: 429. ### V2 Bloom filter size: 1024. Inserted values: 300. Tested values: 2048. Positive tests: 627. False positives: 327.
  9. $ ruby benchmark_v2.rb ### V1 Bloom filter size: 1024. Inserted

    values: 300. Tested values: 2048. Positive tests: 729. False positives: 429. ### V2 Bloom filter size: 1024. Inserted values: 300. Tested values: 2048. Positive tests: 627. False positives: 327.
  10. calculating the optimal size & number of hash functions is

    a solved problem Things to consider:
  11. calculating the optimal size & number of hash functions is

    a solved problem • false positive rate • expected number of items Things to consider: