Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distinct Query using HyperLogLog

Eaa974d2082a6bad3672eabfd6445d02?s=47 hama_du
October 05, 2018

Distinct Query using HyperLogLog

Distinct Queryを例にHyperLogLogのお気持ちを理解する

Eaa974d2082a6bad3672eabfd6445d02?s=128

hama_du

October 05, 2018
Tweet

Transcript

  1. HyperLogLogͰ লϝϞϦͳDistinct Query SDDษڧձ@r-n-i 2018/10/05

  2. Distinct Query

  3. ΫΤϦͷྫ • Distinct([‘A’, ‘B’]) = 2 • Distinct([‘A’, ‘B’, ‘C’,

    ‘A’, ‘C’]) = 3
  4. [࣮૷ํ๏] SetʹಥͬࠐΜͰେ͖͞ΛऔΔ ;ͭ͏ͷ

  5. SetʹಥͬࠐΉࡍͷ໰୊఺ • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍ • ௕͞ͷ෼͚ͩϝϞϦ৯͏

  6. ΫΤϦͷྫ - ۩ମྫ • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ਺͸

  7. ϢχʔΫϢʔβ਺͸… 35915ਓͰͨ͠ʂʂ

  8. ϢχʔΫϢʔβ਺͸… 35915ਓͰͨ͠ʂʂ

  9. ϢχʔΫϢʔβ਺͸… 35915ਓͰͨ͠ʂʂ ͜Ε͍Δʁ

  10. ਖ਼֬ͳ஋͸ ͦΜͳʹେࣄ͡Όͳ͍ ͜ͱ΋͋Δ

  11. Hash஋Λ༻͍ͨਪఆ

  12. ϋογϡ஋ͷܭࢉ hash(AB) = 0x36f… = 0011 0110 1111 … hash(CD)

    = 0xc90… = 1100 1001 0000 … hash(EF) = 0x01e… = 0000 0001 1110 …
  13. ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ zero(hash(AB)) = zero(0011 0110 1111…) = 2

    zero(hash(CD)) = zero(1100 1001 0000…) = 0 zero(hash(EF)) = zero(0000 0001 1110…) = 7
  14. ͜ΕΒͷ࠷େ஋ΛऔΔ D = max( zero(hash(AB)), zero(hash(CD)), zero(hash(EF)) ) = max(2,

    0, 7) = 7
  15. ٯʹ…

  16. ࠷େ஋͚ͩΘ͔ͬͯΔͱ͢Δ D = 7

  17. ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)

    zero(hash(?)) = zero(0000 0001 …) = 7
  18. ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)

    zero(hash(?)) = zero(0000 0001 …) = 7 ݁ߏϨΞʂ
  19. Ͳͷఔ౓ϨΞʁ D = max(?, ?, …, 7, …, ?, ?)

    zero(hash(?)) = zero(0000 0001 …) = 7 1/2^7 = 1/128
  20. ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ D = max(?, ?, …, 7, …, ?, ?)

    1/2^7 = 1/128 ฏۉ128ݸʁ
  21. Distinct ͳཁૉ(Hash)਺Λ େࡶ೺ʹ༧૝Ͱ͖Δ

  22. HyperLogLog

  23. Hashͷ຤ඌͰৼΓ෼͚ hash(AB) = 0x36f… = 0011 0110 … 1010 D:

    0 1 9 10 11 14 15 1 1 0 2 0 0 0 … …
  24. େ͖͍஋Ͱߋ৽ʂ hash(AB) = 0x36f… = 0011 0110 … 1010 D:

    0 1 9 10 11 14 15 2 1 0 2 0 0 0 … …
  25. ཁૉ਺ͷਪఆ • Dͷঢ়گ͕ฏۉͲͷఔ౓ϨΞ͔ʁ • ௐ࿨ฏۉʂ

  26. ཁૉ਺ͷਪఆ 1 1 2 4 C × 4 × 4

    1 22 + 1 21 + 1 21 + 1 24 ശ1ͭ͋ͨΓͷೱ౓
  27. ن໛͕খ͍͞ͱޡࠩ͸ଟΊ

  28. ۭؒܭࢉྔ(࢖༻ϝϞϦ) • ࢖༻ϝϞϦ: ੔਺ܕͷ഑ྻ͚ͩʂ

  29. ࢀߟจݙ • HyperLogLog in Practice: Algorithmic Engineering of a State

    of The Art Cardinality Estimation Algorithm