HyperLogLog in 15 minutes

Beb7f5dd655d8b8e9093ef4fc5e59b6e?s=47 Paul Mucur
November 28, 2018

HyperLogLog in 15 minutes

A brief explanation of the HyperLogLog algorithm for estimating the cardinality of large sets given at the Drover Ruby Meetup on Wednesday, 28th November 2018.

Beb7f5dd655d8b8e9093ef4fc5e59b6e?s=128

Paul Mucur

November 28, 2018
Tweet

Transcript

  1. 3.

    animals = Set.new => #<Set: {}> animals << "dog" =>

    #<Set: {"dog"}> animals << "dog" => #<Set: {"dog"}> animals << "cat" => #<Set: {"dog", "cat"}> animals.size => 2
  2. 5.
  3. 6.
  4. 8.
  5. 9.
  6. 10.

    1

  7. 11.

    1

  8. 12.

    1

  9. 13.

    1

  10. 14.

    1

  11. 15.

    1 2

  12. 16.

    2

  13. 17.

    2

  14. 18.

    2 5

  15. 19.
  16. 24.

    P(0) = 1 21 = 1 2 P(1) = 1

    22 = 1 4 P(2) = 1 23 = 1 8 . . . P(n) = 1 2n+1
  17. 27.
  18. 28.
  19. 29.
  20. 30.
  21. 31.
  22. 32.
  23. 33.
  24. 35.
  25. 38.

    > PFADD tweets 1 2 3 4 5 6 (integer)

    1 > PFCOUNT tweets (integer) 6
  26. 39.
  27. 54.
  28. 56.

    n 1 x1 + 1 x2 + … + 1

    xn Harmonic mean