Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distinct Query using HyperLogLog
Search
hama_du
October 05, 2018
Science
86
2
Share
Distinct Query using HyperLogLog
Distinct Queryを例にHyperLogLogのお気持ちを理解する
hama_du
October 05, 2018
More Decks by hama_du
See All by hama_du
Google File System
hamadu
0
85
木の上を歩こう
hamadu
1
1.1k
linear-algebra-in-n-minutes
hamadu
0
250
Other Decks in Science
See All in Science
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
3
33k
HajimetenoLT vol.17
hashimoto_kei
1
200
Text-to-SQLの既存の評価指標を問い直す
gotalab555
1
190
HDC tutorial
michielstock
1
600
データベース14: B+木 & ハッシュ索引
trycycle
PRO
0
680
会社でMLモデルを作るとは @電気通信大学 データアントレプレナーフェロープログラム
yuto16
1
610
白金鉱業Meetup_Vol.20 効果検証ことはじめ / Introduction to Impact Evaluation
brainpadpr
2
1.8k
フィードフォワードニューラルネットワークを用いた記号入出力制御系に対する制御器設計 / Controller Design for Augmented Systems with Symbolic Inputs and Outputs Using Feedforward Neural Network
konakalab
0
120
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
32k
SHINOMIYA Nariyoshi
genomethica
0
120
(2025) Balade en cyclotomie
mansuy
0
530
2025-06-11-ai_belgium
sofievl
1
250
Featured
See All Featured
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
450
Mobile First: as difficult as doing things right
swwweet
225
10k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
A Soul's Torment
seathinner
5
2.6k
Deep Space Network (abreviated)
tonyrice
0
99
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
260
The Curse of the Amulet
leimatthew05
1
11k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
390
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
250
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
260
Everyday Curiosity
cassininazir
0
180
Transcript
HyperLogLogͰ লϝϞϦͳDistinct Query SDDษڧձ@r-n-i 2018/10/05
Distinct Query
ΫΤϦͷྫ • Distinct([‘A’, ‘B’]) = 2 • Distinct([‘A’, ‘B’, ‘C’,
‘A’, ‘C’]) = 3
[࣮ํ๏] SetʹಥͬࠐΜͰେ͖͞ΛऔΔ ;ͭ͏ͷ
SetʹಥͬࠐΉࡍͷ • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍ • ͞ͷ͚ͩϝϞϦ৯͏
ΫΤϦͷྫ - ۩ମྫ • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ ͜Ε͍Δʁ
ਖ਼֬ͳ ͦΜͳʹେࣄ͡Όͳ͍ ͜ͱ͋Δ
HashΛ༻͍ͨਪఆ
ϋογϡͷܭࢉ hash(AB) = 0x36f… = 0011 0110 1111 … hash(CD)
= 0xc90… = 1100 1001 0000 … hash(EF) = 0x01e… = 0000 0001 1110 …
ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ zero(hash(AB)) = zero(0011 0110 1111…) = 2
zero(hash(CD)) = zero(1100 1001 0000…) = 0 zero(hash(EF)) = zero(0000 0001 1110…) = 7
͜ΕΒͷ࠷େΛऔΔ D = max( zero(hash(AB)), zero(hash(CD)), zero(hash(EF)) ) = max(2,
0, 7) = 7
ٯʹ…
࠷େ͚ͩΘ͔ͬͯΔͱ͢Δ D = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 ݁ߏϨΞʂ
ͲͷఔϨΞʁ D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 1/2^7 = 1/128
ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ D = max(?, ?, …, 7, …, ?, ?)
1/2^7 = 1/128 ฏۉ128ݸʁ
Distinct ͳཁૉ(Hash)Λ େࡶʹ༧Ͱ͖Δ
HyperLogLog
HashͷඌͰৼΓ͚ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 1 1 0 2 0 0 0 … …
େ͖͍Ͱߋ৽ʂ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 2 1 0 2 0 0 0 … …
ཁૉͷਪఆ • Dͷঢ়گ͕ฏۉͲͷఔϨΞ͔ʁ • ௐฏۉʂ
ཁૉͷਪఆ 1 1 2 4 C × 4 × 4
1 22 + 1 21 + 1 21 + 1 24 ശ1ͭ͋ͨΓͷೱ
ن͕খ͍͞ͱޡࠩଟΊ
ۭؒܭࢉྔ(༻ϝϞϦ) • ༻ϝϞϦ: ܕͷྻ͚ͩʂ
ࢀߟจݙ • HyperLogLog in Practice: Algorithmic Engineering of a State
of The Art Cardinality Estimation Algorithm