Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Bloom Filters: A Look Into Ruby
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Fernando Mendes
July 29, 2016
Programming
130
0
Share
Bloom Filters: A Look Into Ruby
Fernando Mendes
July 29, 2016
More Decks by Fernando Mendes
See All by Fernando Mendes
you. and the morals of technology
fribmendes
1
150
Knee-Deep Into P2P: A Tale of Fail (PWL Porto)
fribmendes
0
69
Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)
fribmendes
0
180
Knee-Deep Into P2P: A Tale of Fail (non-Elixir)
fribmendes
0
200
A Look Into Bloom Filters
fribmendes
0
540
Programming WTF: HTML & CSS
fribmendes
4
170
Ruby: A (pointless) Workshop
fribmendes
1
170
Elixir: A Talk For College Students
fribmendes
0
180
Riding Rails
fribmendes
0
120
Other Decks in Programming
See All in Programming
今さら聞けないCancellationToken
htkym
0
220
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
2.3k
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
490
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.5k
プラグインで拡張される Context をtype-safe にする難しさと設計判断
kazupon
2
590
Claspは野良GASの夢をみるか
takter00
0
160
Composerを使ったサプライチェーン攻撃の様子を眺めてみる #phpstudy
o0h
PRO
2
220
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
190
CSC307 Lecture 17
javiergs
PRO
0
310
技術記事、AIに書かせるか、自分で書くか? 〜それでも私が自分の手で書く理由〜 / #QiitaConference
jnchito
2
1.3k
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
140
Stage 3 Decorators でできること / できないこと / TSKaigi 2026
susisu
1
1.5k
Featured
See All Featured
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
280
The Power of CSS Pseudo Elements
geoffreycrofte
82
6.3k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Docker and Python
trallard
47
3.9k
Code Reviewing Like a Champion
maltzj
528
40k
Designing for Performance
lara
611
70k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
GraphQLとの向き合い方2022年版
quramy
50
15k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
28
3.5k
Building an army of robots
kneath
306
46k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
370
A Soul's Torment
seathinner
6
2.9k
Transcript
B L O O M F I LT E R
S or: that one time I was hella bored
Bloom Filters Or: How I Learned To Stop Procrastinating And
Benchmark The Code
THE A MASTERPIECE OF MODERN HORROR FiLTERiNG
2016: a space-efficient odyssey An epic drama of boredom and
exploration
B L O O M F I LT E R
S or: that one time I was hella bored
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
bloom filter
bloom filter do you have the element 3?
bloom filter yeah, probably
bloom filter do you have the element 4?
bloom filter I most certainly do not
bloom filter I most certainly do not “Why do people
even like this thing?”
add ‘subvisual’
hash(‘subvisual’)
add ‘rubyconf’
hash(‘rubyconf’)
test ‘subvisual’
hash(‘subvisual’) all are 1?
test ‘subvisual’ true
test ‘office’
all are 1? hash(‘office’)
test ‘office’ false
test ‘mirrorconf’
hash(‘mirrorconf’) all are 1?
test ‘mirrorconf’ true
test and add play with hash functions get to say
smart stuff like “so I wrote this bloom filter”
diving into it with Ruby
module DumbFilter end
module DumbFilter class Array def initialize @data = [] end
end end
module DumbFilter class Array def add(str) @data << str end
end end
module DumbFilter class Array def test(str) @data.include? str end end
end
you don’t play with hash functions sequential access space wastefulness
module DumbFilter class Hash def initialize @data = {} end
end end
module DumbFilter class Hash def add(str) @data[str] = true end
end end
module DumbFilter class Hash def test(str) @data[str] end end end
you kinda play with hash functions instant access
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
/peterc/bitarray
def initialize(size: 1024) @bits = BitArray.new(size) @fnv = FNV.new @size
= size end
def add(str) @bits[i(str)] = 1 end def i(str) @fnv.fnv1a_64(str) %
@size end
def test(str) @bits[i(str)] == 1 end
you do play with hash functions instant access space-efficient small
universe == more collisions
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def seed(nr) (1..nr).each_with_object([]) do |n, s| s << SecureRandom.hex(3).to_i(16) end
end
def hash(str, seed) MurmurHash3::V32.str_hash(str, seed) end
def i(str) @seeds.map { |s| hash(str, s) % @size }
end
def add(str) set i(str) end def set(indexes) indexes.each { |i|
@bits[i] = 1 } end
def test(str) get i(str) end def get(indexes) indexes.all? { |i|
@bits[i] == 1 } end
demo (yes, yet another goddamned Rails blog app)
None
None
test-drive
5 million random inserts probabilistic universe of 10 million 5
million random accesses /igrigorik/bloomfilter-rb
fnv is really slow ruby string hashing is optimized bloomfilter-rb
uses C extensions
Collision counting ruby’s hash is not probabilistic nor space-efficient “what
about bf_v2’s poor result?”
you do play with hash functions instant access space-efficient small
universe == more collisions
Collision counting: 1024 bits & 300 entries m(bits)/n(entries) * ln(2)
optimal number of hash functions:
in the field
Article tailoring - Quora & Medium Type-ahead queries — Facebook
I/O Filter — Apache HBase Malicious URL Check — bit.ly Checking node communications in IoT sensors
B L O O M F I LT E R
S or: that one time I was hella bored