Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distinct Query using HyperLogLog

hama_du
October 05, 2018

Distinct Query using HyperLogLog

Distinct Queryを例にHyperLogLogのお気持ちを理解する

hama_du

October 05, 2018
Tweet

More Decks by hama_du

Other Decks in Science

Transcript

  1. HyperLogLogͰ
    লϝϞϦͳDistinct Query
    SDDษڧձ@r-n-i
    2018/10/05

    View Slide

  2. Distinct Query

    View Slide

  3. ΫΤϦͷྫ
    • Distinct([‘A’, ‘B’]) = 2
    • Distinct([‘A’, ‘B’, ‘C’, ‘A’, ‘C’]) = 3

    View Slide

  4. [࣮૷ํ๏]
    SetʹಥͬࠐΜͰେ͖͞ΛऔΔ
    ;ͭ͏ͷ

    View Slide

  5. SetʹಥͬࠐΉࡍͷ໰୊఺
    • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍
    • ௕͞ͷ෼͚ͩϝϞϦ৯͏

    View Slide

  6. ΫΤϦͷྫ - ۩ମྫ
    • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ਺͸

    View Slide

  7. ϢχʔΫϢʔβ਺͸…
    35915ਓͰͨ͠ʂʂ

    View Slide

  8. ϢχʔΫϢʔβ਺͸…
    35915ਓͰͨ͠ʂʂ

    View Slide

  9. ϢχʔΫϢʔβ਺͸…
    35915ਓͰͨ͠ʂʂ
    ͜Ε͍Δʁ

    View Slide

  10. ਖ਼֬ͳ஋͸
    ͦΜͳʹେࣄ͡Όͳ͍
    ͜ͱ΋͋Δ

    View Slide

  11. Hash஋Λ༻͍ͨਪఆ

    View Slide

  12. ϋογϡ஋ͷܭࢉ
    hash(AB) = 0x36f… = 0011 0110 1111 …
    hash(CD) = 0xc90… = 1100 1001 0000 …
    hash(EF) = 0x01e… = 0000 0001 1110 …

    View Slide

  13. ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ
    zero(hash(AB))
    = zero(0011 0110 1111…)
    = 2
    zero(hash(CD))
    = zero(1100 1001 0000…)
    = 0
    zero(hash(EF))
    = zero(0000 0001 1110…)
    = 7

    View Slide

  14. ͜ΕΒͷ࠷େ஋ΛऔΔ
    D = max(
    zero(hash(AB)),
    zero(hash(CD)),
    zero(hash(EF))
    )
    = max(2, 0, 7)
    = 7

    View Slide

  15. ٯʹ…

    View Slide

  16. ࠷େ஋͚ͩΘ͔ͬͯΔͱ͢Δ
    D = 7

    View Slide

  17. ͭ·Γ…
    D = max(?, ?, …, 7, …, ?, ?)
    zero(hash(?))
    = zero(0000 0001 …)
    = 7

    View Slide

  18. ͭ·Γ…
    D = max(?, ?, …, 7, …, ?, ?)
    zero(hash(?))
    = zero(0000 0001 …)
    = 7
    ݁ߏϨΞʂ

    View Slide

  19. Ͳͷఔ౓ϨΞʁ
    D = max(?, ?, …, 7, …, ?, ?)
    zero(hash(?))
    = zero(0000 0001 …)
    = 7
    1/2^7 = 1/128

    View Slide

  20. ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ
    D = max(?, ?, …, 7, …, ?, ?)
    1/2^7 = 1/128
    ฏۉ128ݸʁ

    View Slide

  21. Distinct ͳཁૉ(Hash)਺Λ
    େࡶ೺ʹ༧૝Ͱ͖Δ

    View Slide

  22. HyperLogLog

    View Slide

  23. Hashͷ຤ඌͰৼΓ෼͚
    hash(AB) = 0x36f… = 0011 0110 … 1010
    D:
    0 1 9 10 11 14 15
    1
    1
    0
    2 0 0 0
    … …

    View Slide

  24. େ͖͍஋Ͱߋ৽ʂ
    hash(AB) = 0x36f… = 0011 0110 … 1010
    D:
    0 1 9 10 11 14 15
    2
    1
    0
    2 0 0 0
    … …

    View Slide

  25. ཁૉ਺ͷਪఆ
    • Dͷঢ়گ͕ฏۉͲͷఔ౓ϨΞ͔ʁ
    • ௐ࿨ฏۉʂ

    View Slide

  26. ཁૉ਺ͷਪఆ
    1
    1
    2 4
    C × 4 ×
    4
    1
    22
    + 1
    21
    + 1
    21
    + 1
    24
    ശ1ͭ͋ͨΓͷೱ౓

    View Slide

  27. ن໛͕খ͍͞ͱޡࠩ͸ଟΊ

    View Slide

  28. ۭؒܭࢉྔ(࢖༻ϝϞϦ)
    • ࢖༻ϝϞϦ: ੔਺ܕͷ഑ྻ͚ͩʂ

    View Slide

  29. ࢀߟจݙ
    • HyperLogLog in Practice: Algorithmic
    Engineering of a State of The Art Cardinality
    Estimation Algorithm

    View Slide