Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Bloom Filters: A Look Into Ruby
Search
Fernando Mendes
July 29, 2016
Programming
0
100
Bloom Filters: A Look Into Ruby
Fernando Mendes
July 29, 2016
Tweet
Share
More Decks by Fernando Mendes
See All by Fernando Mendes
you. and the morals of technology
fribmendes
1
120
Knee-Deep Into P2P: A Tale of Fail (PWL Porto)
fribmendes
0
53
Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)
fribmendes
0
140
Knee-Deep Into P2P: A Tale of Fail (non-Elixir)
fribmendes
0
150
A Look Into Bloom Filters
fribmendes
0
340
Programming WTF: HTML & CSS
fribmendes
4
150
Ruby: A (pointless) Workshop
fribmendes
1
160
Elixir: A Talk For College Students
fribmendes
0
160
Riding Rails
fribmendes
0
100
Other Decks in Programming
See All in Programming
Package Traits
ikesyo
2
220
Vue.jsでiOSアプリを作る方法
hal_spidernight
0
120
Swiftコンパイラ超入門+async関数の仕組み
shiz
0
190
個人アプリを2年ぶりにアプデしたから褒めて / I just updated my personal app, praise me!
lovee
0
280
毎日13時間もかかるバッチ処理をたった3日で60%短縮するためにやったこと
sho_ssk_
1
670
Beyond ORM
77web
11
1.6k
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
2.1k
EC2からECSへ 念願のコンテナ移行と巨大レガシーPHPアプリケーションの再構築
sumiyae
3
620
バックエンドのためのアプリ内課金入門 (サブスク編)
qnighy
1
200
Lookerは可視化だけじゃない。UIコンポーネントもあるんだ!
ymd65536
1
140
ゼロからの、レトロゲームエンジンの作り方
tokujiros
3
1.1k
watsonx.ai Dojo #6 継続的なAIアプリ開発と展開
oniak3ibm
PRO
0
250
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
32
6.4k
Making the Leap to Tech Lead
cromwellryan
133
9k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
6
210
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Facilitating Awesome Meetings
lara
51
6.2k
Raft: Consensus for Rubyists
vanstee
137
6.7k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
Embracing the Ebb and Flow
colly
84
4.5k
KATA
mclloyd
29
14k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
590
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Documentation Writing (for coders)
carmenintech
67
4.6k
Transcript
B L O O M F I LT E R
S or: that one time I was hella bored
Bloom Filters Or: How I Learned To Stop Procrastinating And
Benchmark The Code
THE A MASTERPIECE OF MODERN HORROR FiLTERiNG
2016: a space-efficient odyssey An epic drama of boredom and
exploration
B L O O M F I LT E R
S or: that one time I was hella bored
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
bloom filter
bloom filter do you have the element 3?
bloom filter yeah, probably
bloom filter do you have the element 4?
bloom filter I most certainly do not
bloom filter I most certainly do not “Why do people
even like this thing?”
add ‘subvisual’
hash(‘subvisual’)
add ‘rubyconf’
hash(‘rubyconf’)
test ‘subvisual’
hash(‘subvisual’) all are 1?
test ‘subvisual’ true
test ‘office’
all are 1? hash(‘office’)
test ‘office’ false
test ‘mirrorconf’
hash(‘mirrorconf’) all are 1?
test ‘mirrorconf’ true
test and add play with hash functions get to say
smart stuff like “so I wrote this bloom filter”
diving into it with Ruby
module DumbFilter end
module DumbFilter class Array def initialize @data = [] end
end end
module DumbFilter class Array def add(str) @data << str end
end end
module DumbFilter class Array def test(str) @data.include? str end end
end
you don’t play with hash functions sequential access space wastefulness
module DumbFilter class Hash def initialize @data = {} end
end end
module DumbFilter class Hash def add(str) @data[str] = true end
end end
module DumbFilter class Hash def test(str) @data[str] end end end
you kinda play with hash functions instant access
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
/peterc/bitarray
def initialize(size: 1024) @bits = BitArray.new(size) @fnv = FNV.new @size
= size end
def add(str) @bits[i(str)] = 1 end def i(str) @fnv.fnv1a_64(str) %
@size end
def test(str) @bits[i(str)] == 1 end
you do play with hash functions instant access space-efficient small
universe == more collisions
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def seed(nr) (1..nr).each_with_object([]) do |n, s| s << SecureRandom.hex(3).to_i(16) end
end
def hash(str, seed) MurmurHash3::V32.str_hash(str, seed) end
def i(str) @seeds.map { |s| hash(str, s) % @size }
end
def add(str) set i(str) end def set(indexes) indexes.each { |i|
@bits[i] = 1 } end
def test(str) get i(str) end def get(indexes) indexes.all? { |i|
@bits[i] == 1 } end
demo (yes, yet another goddamned Rails blog app)
None
None
test-drive
5 million random inserts probabilistic universe of 10 million 5
million random accesses /igrigorik/bloomfilter-rb
fnv is really slow ruby string hashing is optimized bloomfilter-rb
uses C extensions
Collision counting ruby’s hash is not probabilistic nor space-efficient “what
about bf_v2’s poor result?”
you do play with hash functions instant access space-efficient small
universe == more collisions
Collision counting: 1024 bits & 300 entries m(bits)/n(entries) * ln(2)
optimal number of hash functions:
in the field
Article tailoring - Quora & Medium Type-ahead queries — Facebook
I/O Filter — Apache HBase Malicious URL Check — bit.ly Checking node communications in IoT sensors
B L O O M F I LT E R
S or: that one time I was hella bored