Qd-tree: Learning Data Layouts for Big Data Analytics

Qd-tree: Learning Data Layouts for Big Data Analytics Zongheng Yang
(UC Berkeley), Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li , Umar Minhas, Per-Åke Larson, Donald Kossmann (Microsoft Research), Rajeev Acharya (Microsoft) SIGMOD 2020

Layouts are critical for performance Block Block 0 root 5%
1 guest 9% 2 root 95% 3 guest 99% Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp Read Read timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

Layouts are critical for performance Block Block Query usr =
root timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

root 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

root Read Skipped! 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

root Query cpu < 10% 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% As workloads grow, heuristic layouts become increasingly suboptimal. timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Problem statement Raw Data Query Log Laid-out Data Given a
table & a set of queries, partition data to maximize the total # of skipped records

Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2
B3 Y N

B3 Y N Cuts

B3 Y N Cuts Data blocks

B3 Y N Cuts Data blocks Learn qd-trees with excellent skipping → use them to produce data layouts Key Idea

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y
N

N Record cpu = 10 mem = 1G usr = ‘guest’

N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root

N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to

N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block

N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R Record cpu = 10 mem = 1G usr = ‘guest’ Record

N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R ✓ Format-agnostic Serialize in any format Record cpu = 10 mem = 1G usr = ‘guest’ Record

Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0
B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles

B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles

B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles ✓ Run on any engine

qd-tree Laid-out Data (Parquet/CSV) Layout Phase

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read
Querying Phase Query + block_id IN (1,3,9,..)

Querying Phase Finding the optimal qd-tree is NP-hard Query + block_id IN (1,3,9,..)

Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? Query + block_id IN (1,3,9,..)

Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Query + block_id IN (1,3,9,..)

Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Greedy — see paper Query + block_id IN (1,3,9,..)

Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Deep reinforcement learning (RL) • Approximate DP Greedy — see paper Query + block_id IN (1,3,9,..)

Learning qd-trees RL agent Builds trees

Learning qd-trees RL agent Builds trees Repeat: Build a tree,
evaluate quality, learn good/bad cuts.

Learning qd-trees extract cuts sample RL agent Builds trees Raw
Data Query Log Repeat: Build a tree, evaluate quality, learn good/bad cuts.

Learning qd-trees extract cuts sample RL agent Builds trees Raw
Data Query Log Best qd-tree Repeat: Build a tree, evaluate quality, learn good/bad cuts.

RL agent Builds trees

cpu<90? mem>1G? Block Block Y N States nodes — subspace
description RL agent Builds trees

description RL agent Builds trees [cpu=[0,90), mem=(1G,maxMem), usr=*]

description RL agent Builds trees

description Actions cuts — simply all ﬁlters from queries RL agent Builds trees

description Actions cuts — simply all ﬁlters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log RL agent Builds trees

description Actions cuts — simply all ﬁlters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ RL agent Builds trees

description Actions cuts — simply all ﬁlters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 RL agent Builds trees

description Actions cuts — simply all ﬁlters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 3. mem > 1G RL agent Builds trees

Step RL agent Builds trees

? Step RL agent Builds trees 1

? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds
trees Policy Net 1

? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds
trees Sample a cut Policy Net 1

? Prob. Cut0 Cut1 Cut2 Cut3 Initially, ~uniform; improved over
time Step RL agent Builds trees Sample a cut Policy Net 1

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Step
RL agent Builds trees Policy Net 1 2

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.
Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Policy Net Policy Net 1 2

Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net 1 2 T

Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks 1 2 T

Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks Bump probabilities of good cuts → gradually learn to build better trees 1 2 T

Denormalized TPC-H, 1 month (85GB) 10 queries/template

Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4
5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s)

Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4
5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree [SIGMOD’15] Sun et al., Fine-grained Partitioning for Aggressive Data Skipping Based on frequent filter mining

Denormalized TPC-H, 1 month (85GB) 10 queries/template Qd-tree skips 45%
more records, enabling 1.6x overall speedup 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree

Workload #1 Workload #2 Real workloads 1000 queries each bottom-up
qd-tree 0 100 200 300 Runtime (mins) 322 64 bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10

Workload #1 Workload #2 Real workloads 1000 queries each bottom-up
qd-tree 0 100 200 300 Runtime (mins) 322 64 Higher speedups on workloads   with very low selectivities bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10 14x 5x

0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H
0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext

Most improvements learned in ~10 minutes 0 1000 2000 25%
30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext

Most improvements learned in ~10 minutes 0 1000 2000 25%
30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext First trees already better than baselines! (bottom-up; range)

Qd-trees are interpretable

Qd-trees are interpretable Tree Depth

Qd-trees are interpretable Number of cuts Tree Depth

Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95)
l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth

Qd-trees are interpretable p_container IN (LG CASE, LG ...)

Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey
= s_nationkey c_nationkey = s_nationkey Y N

= s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01

= s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01 Use RL to search for useful data structures: inspect & deploy as a white box

Learned layouts for big data analytics deep RL minimizes I/O
Qd-tree

Qd-tree Qd-tree Format Parquet CSV Custom ✓ Format-agnostic

Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic 0 lines added to engines

Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines

Qd-tree zongheng.me/qdtree aka.ms/SimpleStore Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines Thank you!

Qd-tree: Learning Data Layouts for Big Data Ana...

Qd-tree: Learning Data Layouts for Big Data Analytics

More Decks by Zongheng Yang

Other Decks in Research

Featured

Transcript