D Customer E Customer F Liquid cluster by customer ID and date Liquid Clusteringとはなにか? CREATE TABLE my_liquid_table … CLUSTER BY (customer_id, date) AS SELECT …
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Col 1 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Col 1 Col 2 Col 2 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Leaf1 Col 1 Col 2 Col 2 Leaf6 Leaf7 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Leaf2 Leaf3 Leaf4 Leaf5 Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Leaf1 Col 1 Col 2 Col 2 Leaf6 Leaf7 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Leaf2 Leaf3 Leaf4 Leaf5 Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F ターゲットファイルサイズに応じ て最適化します。 Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Leaf1 Col 1 Col 2 Col 2 Leaf6 Leaf7 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Leaf2 Leaf3 Leaf4 Leaf5 Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 Col 1 > 2023-02-06 Col 1 <= 2023-02-06 Leaf1 Col 1 Col 2 Col 2 Leaf6 Leaf7 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Leaf2 Leaf3 Leaf4 Leaf5 Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C ターゲットファイルサイズ Col 1: date Col 2: customer_id Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
Leaf1 Col 1 Col 2 Col 2 Leaf6 Leaf7 Col 2 Col 2 Col 2 > C Col 2 <= C Col 2 > B Col 2 <= B Leaf2 Leaf3 Leaf4 Leaf5 Col 1 > 2023-02-05 Col 1 <= 2023-02-05 Col 2 > D Col 2 <= D Col 2 > C Col 2 <= C Write new data Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 2023-02-05 2023-02-06 2023-02-07 Customer A Customer B Customer C Customer D Customer E Customer F optimize my_table Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 2023-02-05 2023-02-06 2023-02-07 Customer A Customer B Customer C Customer D Customer E Customer F optimize my_table Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F ノードのファイルを最適化: • 小さいファイルの数がfiles_numberの閾値より大きい • ノード・サイズがnode_sizeの閾値より小さい Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 ノードのファイルを最適化: • 小さいファイルの数がfiles_numberの閾値より大きい • ノード・サイズがnode_sizeの閾値より小さい Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 リーフノードの拡張 • ノードサイズがnode_sizeのしきい値より大きい Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 leaf7 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 リーフノードの拡張 • ノードサイズがnode_sizeのしきい値より大きい Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
D Customer E Customer F Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 Col1 Col 2 Col 2 leaf 7 leaf 8 leaf 9 leaf 10 リーフノードの拡張 • ノードサイズがnode_sizeのしきい値より大きい Liquid cluster by customer ID and date Liquid Clusteringとはなにか?
➔ 新しく取り込まれたデータは必要に 応じてクラスタリングされ、以前にク ラスタリングされたデータは無視され る Col 1 leaf1 Col 1 Col 2 Col 2 leaf6 Col 2 Col 2 Leaf2 Leaf3 Leaf4 Leaf5 Col 1 Col 2 Col 2 Leaf7 Leaf8 Leaf9 Leaf10 54 Lazy Clusering Liquid Clusteringとはなにか?