$30 off During Our Annual Pro Sale. View Details »

HyperLogLog in 15 minutes

Paul Mucur
November 28, 2018

HyperLogLog in 15 minutes

A brief explanation of the HyperLogLog algorithm for estimating the cardinality of large sets given at the Drover Ruby Meetup on Wednesday, 28th November 2018.

Paul Mucur

November 28, 2018
Tweet

More Decks by Paul Mucur

Other Decks in Technology

Transcript

  1. HyperLogLog in 15 minutes @mudge

  2. “Cardinality of a set”

  3. animals = Set.new => #<Set: {}> animals << "dog" =>

    #<Set: {"dog"}> animals << "dog" => #<Set: {"dog"}> animals << "cat" => #<Set: {"dog", "cat"}> animals.size => 2
  4. What do we need to count the number of unique

    elements exactly?
  5. None
  6. None
  7. Flipping a coin

  8. None
  9. None
  10. 1

  11. 1

  12. 1

  13. 1

  14. 1

  15. 1 2

  16. 2

  17. 2

  18. 2 5

  19. P(0) = ?

  20. P(0) = 1 2

  21. P(0) = 1 2 P(1) = ?

  22. P(0) = 1 2 P(1) = 1 4

  23. P(0) = 1 2 P(1) = 1 4 P(2) =

    1 8
  24. P(0) = 1 21 = 1 2 P(1) = 1

    22 = 1 4 P(2) = 1 23 = 1 8 . . . P(n) = 1 2n+1
  25. If our highest score is 5 then we can guess

    26 runs P(n) = 1 2n+1
  26. What’s this got to do with estimating the cardinality of

    a set?
  27. "dog"

  28. "dog"

  29. 1 "dog"

  30. 1 "cat"

  31. 1 "cat"

  32. 1 "cat" 3

  33. 1 "dog"

  34. 1 "dog" 0 1 1 0 0 1

  35. 1 "cat" 3

  36. 1 "cat" 3 0 0 0 1 1 0

  37. E := αm m2Z

  38. > PFADD tweets 1 2 3 4 5 6 (integer)

    1 > PFCOUNT tweets (integer) 6
  39. None
  40. ~fin~ @mudge https://mudge.name

  41. 0110101101010

  42. 1 0110101101010

  43. 1 0110101101010 0100001010100

  44. 1 3 0110101101010 0100001010100

  45. 1 3 0110101101010 0100001010100 0011101101010

  46. 1 3 1 0110101101010 0100001010100 0011101101010

  47. 1 3 1 0110101101010 0100001010100 0011101101010 0110011010101

  48. 1 3 1 0110101101010 0100001010100 0011101101010 0110011010101 2

  49. E := αm m2Z

  50. E := αm × m × mZ

  51. E := αm × m × mZ

  52. E := αm × m × mZ

  53. E := αm × m × mZ

  54. α16 = 0.673; α32 = 0.697; αm = 0.7213/(1 +

    1.1079/m) for m ≥ 128
  55. x1 + x2 + … + xn n Arithmetic mean

  56. n 1 x1 + 1 x2 + … + 1

    xn Harmonic mean
  57. mZ := m 2−M[1] + 2−M[2] + . . .

    + 2−M[m]