Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distinct Query using HyperLogLog
Search
hama_du
October 05, 2018
Science
2
81
Distinct Query using HyperLogLog
Distinct Queryを例にHyperLogLogのお気持ちを理解する
hama_du
October 05, 2018
Tweet
Share
More Decks by hama_du
See All by hama_du
Google File System
hamadu
0
80
木の上を歩こう
hamadu
1
980
linear-algebra-in-n-minutes
hamadu
0
240
Other Decks in Science
See All in Science
baseballrによるMLBデータの抽出と階層ベイズモデルによる打率の推定 / TokyoR118
dropout009
1
520
CV_5_3dVision
hachama
0
140
Machine Learning for Materials (Challenge)
aronwalsh
0
310
ド文系だった私が、 KaggleのNCAAコンペでソロ金取れるまで
wakamatsu_takumu
2
870
システム数理と応用分野の未来を切り拓くロードマップ・エンターテインメント(スポーツ)への応用 / Applied mathematics for sports entertainment
konakalab
1
350
データベース11: 正規化(1/2) - 望ましくない関係スキーマ
trycycle
PRO
0
820
03_草原和博_広島大学大学院人間社会科学研究科教授_デジタル_シティズンシップシティで_新たな_学び__をつくる.pdf
sip3ristex
0
520
機械学習 - ニューラルネットワーク入門
trycycle
PRO
0
830
統計的因果探索: 背景知識とデータにより因果仮説を探索する
sshimizu2006
4
940
02_西村訓弘_プログラムディレクター_人口減少を機にひらく未来社会.pdf
sip3ristex
0
530
機械学習 - K近傍法 & 機械学習のお作法
trycycle
PRO
0
1.2k
SciPyDataJapan 2025
schwalbe10
0
240
Featured
See All Featured
Bash Introduction
62gerente
613
210k
Building an army of robots
kneath
306
45k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
A Modern Web Designer's Workflow
chriscoyier
695
190k
Statistics for Hackers
jakevdp
799
220k
Testing 201, or: Great Expectations
jmmastey
43
7.6k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
A designer walks into a library…
pauljervisheath
207
24k
Site-Speed That Sticks
csswizardry
10
720
The Cult of Friendly URLs
andyhume
79
6.5k
Transcript
HyperLogLogͰ লϝϞϦͳDistinct Query SDDษڧձ@r-n-i 2018/10/05
Distinct Query
ΫΤϦͷྫ • Distinct([‘A’, ‘B’]) = 2 • Distinct([‘A’, ‘B’, ‘C’,
‘A’, ‘C’]) = 3
[࣮ํ๏] SetʹಥͬࠐΜͰେ͖͞ΛऔΔ ;ͭ͏ͷ
SetʹಥͬࠐΉࡍͷ • σʔλྻͷαΠζ͕େ͖͍ͱਏ͍ • ͞ͷ͚ͩϝϞϦ৯͏
ΫΤϦͷྫ - ۩ମྫ • ϢʔβIDͷྻʹରͯ͠ɺϢχʔΫϢʔβ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ
ϢχʔΫϢʔβ… 35915ਓͰͨ͠ʂʂ ͜Ε͍Δʁ
ਖ਼֬ͳ ͦΜͳʹେࣄ͡Όͳ͍ ͜ͱ͋Δ
HashΛ༻͍ͨਪఆ
ϋογϡͷܭࢉ hash(AB) = 0x36f… = 0011 0110 1111 … hash(CD)
= 0xc90… = 1100 1001 0000 … hash(EF) = 0x01e… = 0000 0001 1110 …
ઌ಄ʹ͍ͭ͘ 0 ͕͍ͭͯΔʁ zero(hash(AB)) = zero(0011 0110 1111…) = 2
zero(hash(CD)) = zero(1100 1001 0000…) = 0 zero(hash(EF)) = zero(0000 0001 1110…) = 7
͜ΕΒͷ࠷େΛऔΔ D = max( zero(hash(AB)), zero(hash(CD)), zero(hash(EF)) ) = max(2,
0, 7) = 7
ٯʹ…
࠷େ͚ͩΘ͔ͬͯΔͱ͢Δ D = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7
ͭ·Γ… D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 ݁ߏϨΞʂ
ͲͷఔϨΞʁ D = max(?, ?, …, 7, …, ?, ?)
zero(hash(?)) = zero(0000 0001 …) = 7 1/2^7 = 1/128
ϢχʔΫͳHashΛ͍ͭ͘ݟͨʁ D = max(?, ?, …, 7, …, ?, ?)
1/2^7 = 1/128 ฏۉ128ݸʁ
Distinct ͳཁૉ(Hash)Λ େࡶʹ༧Ͱ͖Δ
HyperLogLog
HashͷඌͰৼΓ͚ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 1 1 0 2 0 0 0 … …
େ͖͍Ͱߋ৽ʂ hash(AB) = 0x36f… = 0011 0110 … 1010 D:
0 1 9 10 11 14 15 2 1 0 2 0 0 0 … …
ཁૉͷਪఆ • Dͷঢ়گ͕ฏۉͲͷఔϨΞ͔ʁ • ௐฏۉʂ
ཁૉͷਪఆ 1 1 2 4 C × 4 × 4
1 22 + 1 21 + 1 21 + 1 24 ശ1ͭ͋ͨΓͷೱ
ن͕খ͍͞ͱޡࠩଟΊ
ۭؒܭࢉྔ(༻ϝϞϦ) • ༻ϝϞϦ: ܕͷྻ͚ͩʂ
ࢀߟจݙ • HyperLogLog in Practice: Algorithmic Engineering of a State
of The Art Cardinality Estimation Algorithm