$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Bloom Filters: A Look Into Ruby
Search
Fernando Mendes
July 29, 2016
Programming
0
120
Bloom Filters: A Look Into Ruby
Fernando Mendes
July 29, 2016
Tweet
Share
More Decks by Fernando Mendes
See All by Fernando Mendes
you. and the morals of technology
fribmendes
1
140
Knee-Deep Into P2P: A Tale of Fail (PWL Porto)
fribmendes
0
63
Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)
fribmendes
0
170
Knee-Deep Into P2P: A Tale of Fail (non-Elixir)
fribmendes
0
180
A Look Into Bloom Filters
fribmendes
0
470
Programming WTF: HTML & CSS
fribmendes
4
160
Ruby: A (pointless) Workshop
fribmendes
1
160
Elixir: A Talk For College Students
fribmendes
0
170
Riding Rails
fribmendes
0
110
Other Decks in Programming
See All in Programming
sbt 2
xuwei_k
0
290
SwiftUIで本格音ゲー実装してみた
hypebeans
0
320
MAP, Jigsaw, Code Golf 振り返り会 by 関東Kaggler会|Jigsaw 15th Solution
hasibirok0
0
230
AIエンジニアリングのご紹介 / Introduction to AI Engineering
rkaga
5
2.1k
大体よく分かるscala.collection.immutable.HashMap ~ Compressed Hash-Array Mapped Prefix-tree (CHAMP) ~
matsu_chara
2
220
愛される翻訳の秘訣
kishikawakatsumi
2
320
ハイパーメディア駆動アプリケーションとIslandアーキテクチャ: htmxによるWebアプリケーション開発と動的UIの局所的適用
nowaki28
0
420
20251127_ぼっちのための懇親会対策会議
kokamoto01_metaps
2
430
AIエージェントを活かすPM術 AI駆動開発の現場から
gyuta
0
400
モデル駆動設計をやってみようワークショップ開催報告(Modeling Forum2025) / model driven design workshop report
haru860
0
270
tparseでgo testの出力を見やすくする
utgwkk
2
210
関数実行の裏側では何が起きているのか?
minop1205
1
690
Featured
See All Featured
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
BBQ
matthewcrist
89
9.9k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Become a Pro
speakerdeck
PRO
31
5.7k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.7k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
970
Transcript
B L O O M F I LT E R
S or: that one time I was hella bored
Bloom Filters Or: How I Learned To Stop Procrastinating And
Benchmark The Code
THE A MASTERPIECE OF MODERN HORROR FiLTERiNG
2016: a space-efficient odyssey An epic drama of boredom and
exploration
B L O O M F I LT E R
S or: that one time I was hella bored
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
bloom filter
bloom filter do you have the element 3?
bloom filter yeah, probably
bloom filter do you have the element 4?
bloom filter I most certainly do not
bloom filter I most certainly do not “Why do people
even like this thing?”
add ‘subvisual’
hash(‘subvisual’)
add ‘rubyconf’
hash(‘rubyconf’)
test ‘subvisual’
hash(‘subvisual’) all are 1?
test ‘subvisual’ true
test ‘office’
all are 1? hash(‘office’)
test ‘office’ false
test ‘mirrorconf’
hash(‘mirrorconf’) all are 1?
test ‘mirrorconf’ true
test and add play with hash functions get to say
smart stuff like “so I wrote this bloom filter”
diving into it with Ruby
module DumbFilter end
module DumbFilter class Array def initialize @data = [] end
end end
module DumbFilter class Array def add(str) @data << str end
end end
module DumbFilter class Array def test(str) @data.include? str end end
end
you don’t play with hash functions sequential access space wastefulness
module DumbFilter class Hash def initialize @data = {} end
end end
module DumbFilter class Hash def add(str) @data[str] = true end
end end
module DumbFilter class Hash def test(str) @data[str] end end end
you kinda play with hash functions instant access
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
/peterc/bitarray
def initialize(size: 1024) @bits = BitArray.new(size) @fnv = FNV.new @size
= size end
def add(str) @bits[i(str)] = 1 end def i(str) @fnv.fnv1a_64(str) %
@size end
def test(str) @bits[i(str)] == 1 end
you do play with hash functions instant access space-efficient small
universe == more collisions
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def seed(nr) (1..nr).each_with_object([]) do |n, s| s << SecureRandom.hex(3).to_i(16) end
end
def hash(str, seed) MurmurHash3::V32.str_hash(str, seed) end
def i(str) @seeds.map { |s| hash(str, s) % @size }
end
def add(str) set i(str) end def set(indexes) indexes.each { |i|
@bits[i] = 1 } end
def test(str) get i(str) end def get(indexes) indexes.all? { |i|
@bits[i] == 1 } end
demo (yes, yet another goddamned Rails blog app)
None
None
test-drive
5 million random inserts probabilistic universe of 10 million 5
million random accesses /igrigorik/bloomfilter-rb
fnv is really slow ruby string hashing is optimized bloomfilter-rb
uses C extensions
Collision counting ruby’s hash is not probabilistic nor space-efficient “what
about bf_v2’s poor result?”
you do play with hash functions instant access space-efficient small
universe == more collisions
Collision counting: 1024 bits & 300 entries m(bits)/n(entries) * ln(2)
optimal number of hash functions:
in the field
Article tailoring - Quora & Medium Type-ahead queries — Facebook
I/O Filter — Apache HBase Malicious URL Check — bit.ly Checking node communications in IoT sensors
B L O O M F I LT E R
S or: that one time I was hella bored