Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Bloom Filters: A Look Into Ruby
Search
Fernando Mendes
July 29, 2016
Programming
0
110
Bloom Filters: A Look Into Ruby
Fernando Mendes
July 29, 2016
Tweet
Share
More Decks by Fernando Mendes
See All by Fernando Mendes
you. and the morals of technology
fribmendes
1
130
Knee-Deep Into P2P: A Tale of Fail (PWL Porto)
fribmendes
0
57
Knee-Deep Into P2P: A Tale of Fail (ElixirConf EU 2018 version)
fribmendes
0
150
Knee-Deep Into P2P: A Tale of Fail (non-Elixir)
fribmendes
0
170
A Look Into Bloom Filters
fribmendes
0
400
Programming WTF: HTML & CSS
fribmendes
4
150
Ruby: A (pointless) Workshop
fribmendes
1
160
Elixir: A Talk For College Students
fribmendes
0
160
Riding Rails
fribmendes
0
100
Other Decks in Programming
See All in Programming
データの民主化を支える、透明性のあるデータ利活用への挑戦 2025-06-25 Database Engineering Meetup#7
y_ken
0
280
ktr0731/go-mcpでMCPサーバー作ってみた
takak2166
0
170
WindowInsetsだってテストしたい
ryunen344
1
190
社内での開発コミュニティ活動とモジュラーモノリス標準化事例のご紹介/xPalette and Introduction of Modular monolith standardization
m4maruyama
1
130
GoのWebAssembly活用パターン紹介
syumai
3
10k
CursorはMCPを使った方が良いぞ
taigakono
0
140
Javaに鉄道指向プログラミング (Railway Oriented Pro gramming) のエッセンスを取り入れる/Bringing the Essence of Railway-Oriented Programming to Java
cocet33000
2
580
たった 1 枚の PHP ファイルで実装する MCP サーバ / MCP Server with Vanilla PHP
okashoi
0
120
コード書くの好きな人向けAIコーディング活用tips #orestudy
77web
3
320
第9回 情シス転職ミートアップ 株式会社IVRy(アイブリー)の紹介
ivry_presentationmaterials
1
190
つよそうにふるまい、つよい成果を出すのなら、つよいのかもしれない
irof
1
300
Claude Codeの使い方
ttnyt8701
1
130
Featured
See All Featured
How STYLIGHT went responsive
nonsquared
100
5.6k
We Have a Design System, Now What?
morganepeng
52
7.6k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Navigating Team Friction
lara
187
15k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Building Applications with DynamoDB
mza
95
6.5k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Speed Design
sergeychernyshev
31
1k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
4
200
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
53k
Transcript
B L O O M F I LT E R
S or: that one time I was hella bored
Bloom Filters Or: How I Learned To Stop Procrastinating And
Benchmark The Code
THE A MASTERPIECE OF MODERN HORROR FiLTERiNG
2016: a space-efficient odyssey An epic drama of boredom and
exploration
B L O O M F I LT E R
S or: that one time I was hella bored
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
bloom filter
bloom filter do you have the element 3?
bloom filter yeah, probably
bloom filter do you have the element 4?
bloom filter I most certainly do not
bloom filter I most certainly do not “Why do people
even like this thing?”
add ‘subvisual’
hash(‘subvisual’)
add ‘rubyconf’
hash(‘rubyconf’)
test ‘subvisual’
hash(‘subvisual’) all are 1?
test ‘subvisual’ true
test ‘office’
all are 1? hash(‘office’)
test ‘office’ false
test ‘mirrorconf’
hash(‘mirrorconf’) all are 1?
test ‘mirrorconf’ true
test and add play with hash functions get to say
smart stuff like “so I wrote this bloom filter”
diving into it with Ruby
module DumbFilter end
module DumbFilter class Array def initialize @data = [] end
end end
module DumbFilter class Array def add(str) @data << str end
end end
module DumbFilter class Array def test(str) @data.include? str end end
end
you don’t play with hash functions sequential access space wastefulness
module DumbFilter class Hash def initialize @data = {} end
end end
module DumbFilter class Hash def add(str) @data[str] = true end
end end
module DumbFilter class Hash def test(str) @data[str] end end end
you kinda play with hash functions instant access
“a bloom filter is a space-efficient probabilistic data structure, conceived
by Burton Howard Bloom in 1970 (…) a query returns either "possibly in set" or "definitely not in set"” - Wikipedia, 2016
/peterc/bitarray
def initialize(size: 1024) @bits = BitArray.new(size) @fnv = FNV.new @size
= size end
def add(str) @bits[i(str)] = 1 end def i(str) @fnv.fnv1a_64(str) %
@size end
def test(str) @bits[i(str)] == 1 end
you do play with hash functions instant access space-efficient small
universe == more collisions
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def initialize(size: 1024, iterations: 3) @bits = BitArray.new(size) @size =
size @seeds = seed(iterations) end
def seed(nr) (1..nr).each_with_object([]) do |n, s| s << SecureRandom.hex(3).to_i(16) end
end
def hash(str, seed) MurmurHash3::V32.str_hash(str, seed) end
def i(str) @seeds.map { |s| hash(str, s) % @size }
end
def add(str) set i(str) end def set(indexes) indexes.each { |i|
@bits[i] = 1 } end
def test(str) get i(str) end def get(indexes) indexes.all? { |i|
@bits[i] == 1 } end
demo (yes, yet another goddamned Rails blog app)
None
None
test-drive
5 million random inserts probabilistic universe of 10 million 5
million random accesses /igrigorik/bloomfilter-rb
fnv is really slow ruby string hashing is optimized bloomfilter-rb
uses C extensions
Collision counting ruby’s hash is not probabilistic nor space-efficient “what
about bf_v2’s poor result?”
you do play with hash functions instant access space-efficient small
universe == more collisions
Collision counting: 1024 bits & 300 entries m(bits)/n(entries) * ln(2)
optimal number of hash functions:
in the field
Article tailoring - Quora & Medium Type-ahead queries — Facebook
I/O Filter — Apache HBase Malicious URL Check — bit.ly Checking node communications in IoT sensors
B L O O M F I LT E R
S or: that one time I was hella bored