Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Distinct Query using HyperLogLog
hama_du
October 05, 2018
Science
2
48
Distinct Query using HyperLogLog
Distinct Queryを例にHyperLogLogのお気持ちを理解する
hama_du
October 05, 2018
Tweet
Share
More Decks by hama_du
See All by hama_du
Google File System
hamadu
0
66
木の上を歩こう
hamadu
1
560
linear-algebra-in-n-minutes
hamadu
0
220
Other Decks in Science
See All in Science
LIBD_DS_TLDR
lcolladotor
0
180
FreeCADで簡易版バスケットボールのモデル
kamakiri1225
0
250
CrossWalk: Fairness-enhanced Node Representation Learning
sansan_randd
2
840
Maybe it’s time to do something with all those 3D city models?
hugoledoux
0
630
実験ノートをどう取るべきか
rinabouk
PRO
1
1.6k
統計学実践ワークブック 第16章 重回帰分析 pp.125-127
axjack
0
110
Pangeo Forge Tutorial Intoduction
rabernat
0
120
計算量理論
hn410
0
330
資料科學哪有這麼可愛
line_developers_tw
PRO
0
1.2k
深層学習による自然言語処理 輪読会#1 資料
tok41
0
260
About ISEE NLFFF database (v1.1)
hsc_nagoya
0
1.2k
Теория байесовских сетей - осень 2021 - 2 лекция
dscs
0
110
Featured
See All Featured
Fontdeck: Realign not Redesign
paulrobertlloyd
73
4.1k
Clear Off the Table
cherdarchuk
79
280k
The Power of CSS Pseudo Elements
geoffreycrofte
46
3.9k
Documentation Writing (for coders)
carmenhchung
48
2.5k
Keith and Marios Guide to Fast Websites
keithpitt
404
21k
Unsuck your backbone
ammeep
659
55k
Building Flexible Design Systems
yeseniaperezcruz
310
33k
Streamline your AJAX requests with AmplifyJS and jQuery
dougneiner
125
8.5k
The Language of Interfaces
destraynor
148
20k
Product Roadmaps are Hard
iamctodd
34
6.1k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
3
430
The Invisible Customer
myddelton
110
11k
Transcript
HyperLogLogͰ লϝϞϦͳDistinct Query SDDษڧձ@r-n-i 2018/10/05
Distinct Query
ΫΤϦͷྫ • Distinct([‘A’, ‘B’]) = 2 • Distinct([‘A’, ‘B’, ‘C’,
‘A’, ‘C’]) = 3
[࣮ํ๏] SetʹಥͬࠐΜͰେ͖͞ΛऔΔ ;ͭ͏ͷ
SetʹಥͬࠐΉࡍͷ • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍ • ͞ͷ͚ͩϝϞϦ৯͏
ΫΤϦͷྫ - ۩ମྫ • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ ͜Ε͍Δʁ
ਖ਼֬ͳ ͦΜͳʹେࣄ͡Όͳ͍ ͜ͱ͋Δ
HashΛ༻͍ͨਪఆ
ϋογϡͷܭࢉ hash(AB) = 0x36f… = 0011 0110 1111 … hash(CD)
= 0xc90… = 1100 1001 0000 … hash(EF) = 0x01e… = 0000 0001 1110 …
ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ zero(hash(AB)) = zero(0011 0110 1111…) = 2
zero(hash(CD)) = zero(1100 1001 0000…) = 0 zero(hash(EF)) = zero(0000 0001 1110…) = 7
͜ΕΒͷ࠷େΛऔΔ D = max( zero(hash(AB)), zero(hash(CD)), zero(hash(EF)) ) = max(2,
0, 7) = 7
ٯʹ…
࠷େ͚ͩΘ͔ͬͯΔͱ͢Δ D = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 ݁ߏϨΞʂ
ͲͷఔϨΞʁ D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 1/2^7 = 1/128
ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ D = max(?, ?, …, 7, …, ?, ?)
1/2^7 = 1/128 ฏۉ128ݸʁ
Distinct ͳཁૉ(Hash)Λ େࡶʹ༧Ͱ͖Δ
HyperLogLog
HashͷඌͰৼΓ͚ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 1 1 0 2 0 0 0 … …
େ͖͍Ͱߋ৽ʂ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 2 1 0 2 0 0 0 … …
ཁૉͷਪఆ • Dͷঢ়گ͕ฏۉͲͷఔϨΞ͔ʁ • ௐฏۉʂ
ཁૉͷਪఆ 1 1 2 4 C × 4 × 4
1 22 + 1 21 + 1 21 + 1 24 ശ1ͭ͋ͨΓͷೱ
ن͕খ͍͞ͱޡࠩଟΊ
ۭؒܭࢉྔ(༻ϝϞϦ) • ༻ϝϞϦ: ܕͷྻ͚ͩʂ
ࢀߟจݙ • HyperLogLog in Practice: Algorithmic Engineering of a State
of The Art Cardinality Estimation Algorithm