×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
HyperLogLog in 15 minutes @mudge
Slide 2
Slide 2 text
“Cardinality of a set”
Slide 3
Slide 3 text
animals = Set.new => # animals << "dog" => # animals << "dog" => # animals << "cat" => # animals.size => 2
Slide 4
Slide 4 text
What do we need to count the number of unique elements exactly?
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
Flipping a coin
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
1
Slide 11
Slide 11 text
1
Slide 12
Slide 12 text
1
Slide 13
Slide 13 text
1
Slide 14
Slide 14 text
1
Slide 15
Slide 15 text
1 2
Slide 16
Slide 16 text
2
Slide 17
Slide 17 text
2
Slide 18
Slide 18 text
2 5
Slide 19
Slide 19 text
P(0) = ?
Slide 20
Slide 20 text
P(0) = 1 2
Slide 21
Slide 21 text
P(0) = 1 2 P(1) = ?
Slide 22
Slide 22 text
P(0) = 1 2 P(1) = 1 4
Slide 23
Slide 23 text
P(0) = 1 2 P(1) = 1 4 P(2) = 1 8
Slide 24
Slide 24 text
P(0) = 1 21 = 1 2 P(1) = 1 22 = 1 4 P(2) = 1 23 = 1 8 . . . P(n) = 1 2n+1
Slide 25
Slide 25 text
If our highest score is 5 then we can guess 26 runs P(n) = 1 2n+1
Slide 26
Slide 26 text
What’s this got to do with estimating the cardinality of a set?
Slide 27
Slide 27 text
"dog"
Slide 28
Slide 28 text
"dog"
Slide 29
Slide 29 text
1 "dog"
Slide 30
Slide 30 text
1 "cat"
Slide 31
Slide 31 text
1 "cat"
Slide 32
Slide 32 text
1 "cat" 3
Slide 33
Slide 33 text
1 "dog"
Slide 34
Slide 34 text
1 "dog" 0 1 1 0 0 1
Slide 35
Slide 35 text
1 "cat" 3
Slide 36
Slide 36 text
1 "cat" 3 0 0 0 1 1 0
Slide 37
Slide 37 text
E := αm m2Z
Slide 38
Slide 38 text
> PFADD tweets 1 2 3 4 5 6 (integer) 1 > PFCOUNT tweets (integer) 6
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
~fin~ @mudge https://mudge.name
Slide 41
Slide 41 text
0110101101010
Slide 42
Slide 42 text
1 0110101101010
Slide 43
Slide 43 text
1 0110101101010 0100001010100
Slide 44
Slide 44 text
1 3 0110101101010 0100001010100
Slide 45
Slide 45 text
1 3 0110101101010 0100001010100 0011101101010
Slide 46
Slide 46 text
1 3 1 0110101101010 0100001010100 0011101101010
Slide 47
Slide 47 text
1 3 1 0110101101010 0100001010100 0011101101010 0110011010101
Slide 48
Slide 48 text
1 3 1 0110101101010 0100001010100 0011101101010 0110011010101 2
Slide 49
Slide 49 text
E := αm m2Z
Slide 50
Slide 50 text
E := αm × m × mZ
Slide 51
Slide 51 text
E := αm × m × mZ
Slide 52
Slide 52 text
E := αm × m × mZ
Slide 53
Slide 53 text
E := αm × m × mZ
Slide 54
Slide 54 text
α16 = 0.673; α32 = 0.697; αm = 0.7213/(1 + 1.1079/m) for m ≥ 128
Slide 55
Slide 55 text
x1 + x2 + … + xn n Arithmetic mean
Slide 56
Slide 56 text
n 1 x1 + 1 x2 + … + 1 xn Harmonic mean
Slide 57
Slide 57 text
mZ := m 2−M[1] + 2−M[2] + . . . + 2−M[m]