Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Bloom Filters: A Look Into Ruby
Search
Fernando Mendes
July 29, 2016
Programming
0
98
Bloom Filters: A Look Into Ruby
Fernando Mendes
July 29, 2016
Tweet
Share
More Decks by Fernando Mendes
See All by Fernando Mendes
you. and the morals of technology
fribmendes
1
100
Knee-Deep Into P2P: A Tale of Fail (PWL Porto)
fribmendes
0
44
Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)
fribmendes
0
89
Knee-Deep Into P2P: A Tale of Fail (non-Elixir)
fribmendes
0
130
A Look Into Bloom Filters
fribmendes
0
210
Programming WTF: HTML & CSS
fribmendes
4
150
Ruby: A (pointless) Workshop
fribmendes
1
150
Elixir: A Talk For College Students
fribmendes
0
140
Riding Rails
fribmendes
0
94
Other Decks in Programming
See All in Programming
Goのエラースタックトレースの歴史と今後
sonatard
7
1.2k
코틀린으로 멀티플랫폼 만들기
pangmoo
0
150
Apache Hive 4 on Treasure Data
ryukobayashi
0
170
try!Swift Tokyo 2024 参加報告 LT
akidon0000
1
220
#phpcon_odawara オープン・クローズドなテストフィクスチャを求めて / open closed test fixtures
77web
3
230
Elm Form Validation
bkuhlmann
0
510
FigmaとPHPで作る1ミリたりとも表示崩れしない最強の帳票印刷ソリューション
ttskch
43
19k
"config" ってなんだ? / What is "config"?
okashoi
0
240
R言語の環境構築と基礎 Tokyo.R 112
bob3bob3
0
260
GraphQLサーバの構成要素を整理する #ハッカー鮨 #tsukijigraphql / graphql server technology selection
izumin5210
4
820
try! Swift Tokyo 2024 参加報告 / try! Swift Tokyo 2024 Report
hironytic
0
200
大規模Reactアプリのリアーキテクチャ~8万行のTanStack Query移行の軌跡~
kj455
4
950
Featured
See All Featured
Creatively Recalculating Your Daily Design Routine
revolveconf
210
11k
Scaling GitHub
holman
457
140k
How to name files
jennybc
65
93k
Six Lessons from altMBA
skipperchong
21
3k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
14
1.5k
Building an army of robots
kneath
300
41k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
659
120k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
20
1.9k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
244
20k
Why Our Code Smells
bkeepers
PRO
331
56k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.6k
Imperfection Machines: The Place of Print at Facebook
scottboms
260
12k
Transcript
B L O O M F I LT E R
S or: that one time I was hella bored
Bloom Filters Or: How I Learned To Stop Procrastinating And
Benchmark The Code
THE A MASTERPIECE OF MODERN HORROR FiLTERiNG
2016: a space-efficient odyssey An epic drama of boredom and
exploration
B L O O M F I LT E R
S or: that one time I was hella bored
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
bloom filter
bloom filter do you have the element 3?
bloom filter yeah, probably
bloom filter do you have the element 4?
bloom filter I most certainly do not
bloom filter I most certainly do not “Why do people
even like this thing?”
add ‘subvisual’
hash(‘subvisual’)
add ‘rubyconf’
hash(‘rubyconf’)
test ‘subvisual’
hash(‘subvisual’) all are 1?
test ‘subvisual’ true
test ‘office’
all are 1? hash(‘office’)
test ‘office’ false
test ‘mirrorconf’
hash(‘mirrorconf’) all are 1?
test ‘mirrorconf’ true
test and add play with hash functions get to say
smart stuff like “so I wrote this bloom filter”
diving into it with Ruby
module DumbFilter end
module DumbFilter class Array def initialize @data = [] end
end end
module DumbFilter class Array def add(str) @data << str end
end end
module DumbFilter class Array def test(str) @data.include? str end end
end
you don’t play with hash functions sequential access space wastefulness
module DumbFilter class Hash def initialize @data = {} end
end end
module DumbFilter class Hash def add(str) @data[str] = true end
end end
module DumbFilter class Hash def test(str) @data[str] end end end
you kinda play with hash functions instant access
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
/peterc/bitarray
def initialize(size: 1024) @bits = BitArray.new(size) @fnv = FNV.new @size
= size end
def add(str) @bits[i(str)] = 1 end def i(str) @fnv.fnv1a_64(str) %
@size end
def test(str) @bits[i(str)] == 1 end
you do play with hash functions instant access space-efficient small
universe == more collisions
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def seed(nr) (1..nr).each_with_object([]) do |n, s| s << SecureRandom.hex(3).to_i(16) end
end
def hash(str, seed) MurmurHash3::V32.str_hash(str, seed) end
def i(str) @seeds.map { |s| hash(str, s) % @size }
end
def add(str) set i(str) end def set(indexes) indexes.each { |i|
@bits[i] = 1 } end
def test(str) get i(str) end def get(indexes) indexes.all? { |i|
@bits[i] == 1 } end
demo (yes, yet another goddamned Rails blog app)
None
None
test-drive
5 million random inserts probabilistic universe of 10 million 5
million random accesses /igrigorik/bloomfilter-rb
fnv is really slow ruby string hashing is optimized bloomfilter-rb
uses C extensions
Collision counting ruby’s hash is not probabilistic nor space-efficient “what
about bf_v2’s poor result?”
you do play with hash functions instant access space-efficient small
universe == more collisions
Collision counting: 1024 bits & 300 entries m(bits)/n(entries) * ln(2)
optimal number of hash functions:
in the field
Article tailoring - Quora & Medium Type-ahead queries — Facebook
I/O Filter — Apache HBase Malicious URL Check — bit.ly Checking node communications in IoT sensors
B L O O M F I LT E R
S or: that one time I was hella bored