Slide 1

Slide 1 text

HyperLogLog in 15 minutes @mudge

Slide 2

Slide 2 text

“Cardinality of a set”

Slide 3

Slide 3 text

animals = Set.new => # animals << "dog" => # animals << "dog" => # animals << "cat" => # animals.size => 2

Slide 4

Slide 4 text

What do we need to count the number of unique elements exactly?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Flipping a coin

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

1

Slide 11

Slide 11 text

1

Slide 12

Slide 12 text

1

Slide 13

Slide 13 text

1

Slide 14

Slide 14 text

1

Slide 15

Slide 15 text

1 2

Slide 16

Slide 16 text

2

Slide 17

Slide 17 text

2

Slide 18

Slide 18 text

2 5

Slide 19

Slide 19 text

P(0) = ?

Slide 20

Slide 20 text

P(0) = 1 2

Slide 21

Slide 21 text

P(0) = 1 2 P(1) = ?

Slide 22

Slide 22 text

P(0) = 1 2 P(1) = 1 4

Slide 23

Slide 23 text

P(0) = 1 2 P(1) = 1 4 P(2) = 1 8

Slide 24

Slide 24 text

P(0) = 1 21 = 1 2 P(1) = 1 22 = 1 4 P(2) = 1 23 = 1 8 . . . P(n) = 1 2n+1

Slide 25

Slide 25 text

If our highest score is 5 then we can guess 26 runs P(n) = 1 2n+1

Slide 26

Slide 26 text

What’s this got to do with estimating the cardinality of a set?

Slide 27

Slide 27 text

"dog"

Slide 28

Slide 28 text

"dog"

Slide 29

Slide 29 text

1 "dog"

Slide 30

Slide 30 text

1 "cat"

Slide 31

Slide 31 text

1 "cat"

Slide 32

Slide 32 text

1 "cat" 3

Slide 33

Slide 33 text

1 "dog"

Slide 34

Slide 34 text

1 "dog" 0 1 1 0 0 1

Slide 35

Slide 35 text

1 "cat" 3

Slide 36

Slide 36 text

1 "cat" 3 0 0 0 1 1 0

Slide 37

Slide 37 text

E := αm m2Z

Slide 38

Slide 38 text

> PFADD tweets 1 2 3 4 5 6 (integer) 1 > PFCOUNT tweets (integer) 6

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

~fin~ @mudge https://mudge.name

Slide 41

Slide 41 text

0110101101010

Slide 42

Slide 42 text

1 0110101101010

Slide 43

Slide 43 text

1 0110101101010 0100001010100

Slide 44

Slide 44 text

1 3 0110101101010 0100001010100

Slide 45

Slide 45 text

1 3 0110101101010 0100001010100 0011101101010

Slide 46

Slide 46 text

1 3 1 0110101101010 0100001010100 0011101101010

Slide 47

Slide 47 text

1 3 1 0110101101010 0100001010100 0011101101010 0110011010101

Slide 48

Slide 48 text

1 3 1 0110101101010 0100001010100 0011101101010 0110011010101 2

Slide 49

Slide 49 text

E := αm m2Z

Slide 50

Slide 50 text

E := αm × m × mZ

Slide 51

Slide 51 text

E := αm × m × mZ

Slide 52

Slide 52 text

E := αm × m × mZ

Slide 53

Slide 53 text

E := αm × m × mZ

Slide 54

Slide 54 text

α16 = 0.673; α32 = 0.697; αm = 0.7213/(1 + 1.1079/m) for m ≥ 128

Slide 55

Slide 55 text

x1 + x2 + … + xn n Arithmetic mean

Slide 56

Slide 56 text

n 1 x1 + 1 x2 + … + 1 xn Harmonic mean

Slide 57

Slide 57 text

mZ := m 2−M[1] + 2−M[2] + . . . + 2−M[m]