Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HyperLogLog in 15 minutes

Paul Mucur
November 28, 2018

HyperLogLog in 15 minutes

A brief explanation of the HyperLogLog algorithm for estimating the cardinality of large sets given at the Drover Ruby Meetup on Wednesday, 28th November 2018.

Paul Mucur

November 28, 2018
Tweet

More Decks by Paul Mucur

Other Decks in Technology

Transcript

  1. animals = Set.new => #<Set: {}> animals << "dog" =>

    #<Set: {"dog"}> animals << "dog" => #<Set: {"dog"}> animals << "cat" => #<Set: {"dog", "cat"}> animals.size => 2
  2. 1

  3. 1

  4. 1

  5. 1

  6. 1

  7. 1 2

  8. 2

  9. 2

  10. 2 5

  11. P(0) = 1 21 = 1 2 P(1) = 1

    22 = 1 4 P(2) = 1 23 = 1 8 . . . P(n) = 1 2n+1
  12. > PFADD tweets 1 2 3 4 5 6 (integer)

    1 > PFCOUNT tweets (integer) 6
  13. n 1 x1 + 1 x2 + … + 1

    xn Harmonic mean