The probability of "leftmost '1' is at bit N" is 1/(2^N) In other words, we have to choose an integer at least 2^N times to get an integer such that "letmost '1' is at bit N"
to 64bit integer space by applying hash function Equivalent to choosing 64bit integer at random N(unique elements in the dataset) times Conversely, if we have an integer such that "leftmost '1' is at bit N" as a result of applying hash function, it indicates that "the dataset contains 2^N unique elements" Now we approximated count-distinct by only "leftmost position of '1' bit"
(i.e. HLL sketch) :) SELECT uniqCombinedMerge(state) FROM ( SELECT uniqCombinedState(cookie_id) AS state FROM page_views WHERE domain = 'example.com' UNION ALL SELECT uniqCombinedState(cookie_id) AS state FROM page_views WHERE domain = 'example2.com'); ┌─uniqCombinedMerge(state)─┐ │ 195763 │ └──────────────────────────┘
currently adopts Single estimation formula for all carinality range No bias correction table needed "New cardinality estimation algorithms for HyperLogLog sketches" (Otmar Ertl, 2017)