Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distinct Query using HyperLogLog
Search
hama_du
October 05, 2018
Science
2
85
Distinct Query using HyperLogLog
Distinct Queryを例にHyperLogLogのお気持ちを理解する
hama_du
October 05, 2018
Tweet
Share
More Decks by hama_du
See All by hama_du
Google File System
hamadu
0
83
木の上を歩こう
hamadu
1
1k
linear-algebra-in-n-minutes
hamadu
0
250
Other Decks in Science
See All in Science
Optimization of the Tournament Format for the Nationwide High School Kyudo Competition in Japan
konakalab
0
140
HDC tutorial
michielstock
1
340
機械学習 - DBSCAN
trycycle
PRO
0
1.5k
AIによる科学の加速: 各領域での革新と共創の未来
masayamoriofficial
0
390
データベース06: SQL (3/3) 副問い合わせ
trycycle
PRO
1
720
データマイニング - コミュニティ発見
trycycle
PRO
0
200
Distributional Regression
tackyas
0
310
データベース09: 実体関連モデル上の一貫性制約
trycycle
PRO
0
1.1k
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
PRO
0
140
Collective Predictive Coding as a Unified Theory for the Socio-Cognitive Human Minds
tanichu
0
150
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
2
27k
データベース03: 関係データモデル
trycycle
PRO
1
330
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
135
9.7k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
69
For a Future-Friendly Web
brad_frost
182
10k
sira's awesome portfolio website redesign presentation
elsirapls
0
140
Scaling GitHub
holman
464
140k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
The Invisible Side of Design
smashingmag
302
51k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
110
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
440
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Ruling the World: When Life Gets Gamed
codingconduct
0
130
Transcript
HyperLogLogͰ লϝϞϦͳDistinct Query SDDษڧձ@r-n-i 2018/10/05
Distinct Query
ΫΤϦͷྫ • Distinct([‘A’, ‘B’]) = 2 • Distinct([‘A’, ‘B’, ‘C’,
‘A’, ‘C’]) = 3
[࣮ํ๏] SetʹಥͬࠐΜͰେ͖͞ΛऔΔ ;ͭ͏ͷ
SetʹಥͬࠐΉࡍͷ • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍ • ͞ͷ͚ͩϝϞϦ৯͏
ΫΤϦͷྫ - ۩ମྫ • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ ͜Ε͍Δʁ
ਖ਼֬ͳ ͦΜͳʹେࣄ͡Όͳ͍ ͜ͱ͋Δ
HashΛ༻͍ͨਪఆ
ϋογϡͷܭࢉ hash(AB) = 0x36f… = 0011 0110 1111 … hash(CD)
= 0xc90… = 1100 1001 0000 … hash(EF) = 0x01e… = 0000 0001 1110 …
ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ zero(hash(AB)) = zero(0011 0110 1111…) = 2
zero(hash(CD)) = zero(1100 1001 0000…) = 0 zero(hash(EF)) = zero(0000 0001 1110…) = 7
͜ΕΒͷ࠷େΛऔΔ D = max( zero(hash(AB)), zero(hash(CD)), zero(hash(EF)) ) = max(2,
0, 7) = 7
ٯʹ…
࠷େ͚ͩΘ͔ͬͯΔͱ͢Δ D = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 ݁ߏϨΞʂ
ͲͷఔϨΞʁ D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 1/2^7 = 1/128
ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ D = max(?, ?, …, 7, …, ?, ?)
1/2^7 = 1/128 ฏۉ128ݸʁ
Distinct ͳཁૉ(Hash)Λ େࡶʹ༧Ͱ͖Δ
HyperLogLog
HashͷඌͰৼΓ͚ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 1 1 0 2 0 0 0 … …
େ͖͍Ͱߋ৽ʂ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 2 1 0 2 0 0 0 … …
ཁૉͷਪఆ • Dͷঢ়گ͕ฏۉͲͷఔϨΞ͔ʁ • ௐฏۉʂ
ཁૉͷਪఆ 1 1 2 4 C × 4 × 4
1 22 + 1 21 + 1 21 + 1 24 ശ1ͭ͋ͨΓͷೱ
ن͕খ͍͞ͱޡࠩଟΊ
ۭؒܭࢉྔ(༻ϝϞϦ) • ༻ϝϞϦ: ܕͷྻ͚ͩʂ
ࢀߟจݙ • HyperLogLog in Practice: Algorithmic Engineering of a State
of The Art Cardinality Estimation Algorithm