Slide 1

Slide 1 text

Qd-tree: Learning Data Layouts for Big Data Analytics Zongheng Yang (UC Berkeley), Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li , Umar Minhas, Per-Åke Larson, Donald Kossmann (Microsoft Research), Rajeev Acharya (Microsoft) SIGMOD 2020

Slide 2

Slide 2 text

Layouts are critical for performance Block Block 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

Slide 3

Slide 3 text

Layouts are critical for performance Block Block 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

Slide 4

Slide 4 text

Layouts are critical for performance Block Block 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

Slide 5

Slide 5 text

Layouts are critical for performance Block Block 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp Read Read timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%

Slide 6

Slide 6 text

Layouts are critical for performance Block Block Query usr = root timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 7

Slide 7 text

Layouts are critical for performance Block Block Query usr = root 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 8

Slide 8 text

Layouts are critical for performance Block Block Query usr = root Read Skipped! 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 9

Slide 9 text

Layouts are critical for performance Block Block Query usr = root Query cpu < 10% 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 10

Slide 10 text

Layouts are critical for performance Block Block Query usr = root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 11

Slide 11 text

Layouts are critical for performance Block Block Query usr = root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% As workloads grow, heuristic layouts become increasingly suboptimal. timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr

Slide 12

Slide 12 text

Problem statement Raw Data Query Log Laid-out Data Given a table & a set of queries, partition data to maximize the total # of skipped records

Slide 13

Slide 13 text

Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N

Slide 14

Slide 14 text

Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Cuts

Slide 15

Slide 15 text

Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Cuts Data blocks

Slide 16

Slide 16 text

Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Cuts Data blocks Learn qd-trees with excellent skipping → use them to produce data layouts Key Idea

Slide 17

Slide 17 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N

Slide 18

Slide 18 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’

Slide 19

Slide 19 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root

Slide 20

Slide 20 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to

Slide 21

Slide 21 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to

Slide 22

Slide 22 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to

Slide 23

Slide 23 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block

Slide 24

Slide 24 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R Record cpu = 10 mem = 1G usr = ‘guest’ Record

Slide 25

Slide 25 text

Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R ✓ Format-agnostic Serialize in any format Record cpu = 10 mem = 1G usr = ‘guest’ Record

Slide 26

Slide 26 text

Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles

Slide 27

Slide 27 text

Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles

Slide 28

Slide 28 text

Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles

Slide 29

Slide 29 text

Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles ✓ Run on any engine

Slide 30

Slide 30 text

qd-tree Laid-out Data (Parquet/CSV) Layout Phase

Slide 31

Slide 31 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Query + block_id IN (1,3,9,..)

Slide 32

Slide 32 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Finding the optimal qd-tree is NP-hard Query + block_id IN (1,3,9,..)

Slide 33

Slide 33 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? Query + block_id IN (1,3,9,..)

Slide 34

Slide 34 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Query + block_id IN (1,3,9,..)

Slide 35

Slide 35 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Greedy — see paper Query + block_id IN (1,3,9,..)

Slide 36

Slide 36 text

Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Deep reinforcement learning (RL) • Approximate DP Greedy — see paper Query + block_id IN (1,3,9,..)

Slide 37

Slide 37 text

Learning qd-trees RL agent Builds trees

Slide 38

Slide 38 text

Learning qd-trees RL agent Builds trees Repeat: Build a tree, evaluate quality, learn good/bad cuts.

Slide 39

Slide 39 text

Learning qd-trees extract cuts sample RL agent Builds trees Raw Data Query Log Repeat: Build a tree, evaluate quality, learn good/bad cuts.

Slide 40

Slide 40 text

Learning qd-trees extract cuts sample RL agent Builds trees Raw Data Query Log Best qd-tree Repeat: Build a tree, evaluate quality, learn good/bad cuts.

Slide 41

Slide 41 text

RL agent Builds trees

Slide 42

Slide 42 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description RL agent Builds trees

Slide 43

Slide 43 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description RL agent Builds trees [cpu=[0,90), mem=(1G,maxMem), usr=*]

Slide 44

Slide 44 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description RL agent Builds trees

Slide 45

Slide 45 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description Actions cuts — simply all filters from queries RL agent Builds trees

Slide 46

Slide 46 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log RL agent Builds trees

Slide 47

Slide 47 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ RL agent Builds trees

Slide 48

Slide 48 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 RL agent Builds trees

Slide 49

Slide 49 text

cpu<90? mem>1G? Block Block Y N States nodes — subspace description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 3. mem > 1G RL agent Builds trees

Slide 50

Slide 50 text

Step RL agent Builds trees

Slide 51

Slide 51 text

? Step RL agent Builds trees 1

Slide 52

Slide 52 text

? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Policy Net 1

Slide 53

Slide 53 text

? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Sample a cut Policy Net 1

Slide 54

Slide 54 text

? Prob. Cut0 Cut1 Cut2 Cut3 Initially, ~uniform; improved over time Step RL agent Builds trees Sample a cut Policy Net 1

Slide 55

Slide 55 text

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Step RL agent Builds trees Policy Net 1 2

Slide 56

Slide 56 text

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Policy Net Policy Net 1 2

Slide 57

Slide 57 text

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net 1 2 T

Slide 58

Slide 58 text

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks 1 2 T

Slide 59

Slide 59 text

? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks Bump probabilities of good cuts → gradually learn to build better trees 1 2 T

Slide 60

Slide 60 text

Denormalized TPC-H, 1 month (85GB) 10 queries/template

Slide 61

Slide 61 text

Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s)

Slide 62

Slide 62 text

Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree [SIGMOD’15] Sun et al., Fine-grained Partitioning for Aggressive Data Skipping Based on frequent filter mining

Slide 63

Slide 63 text

Denormalized TPC-H, 1 month (85GB) 10 queries/template Qd-tree skips 45% more records, enabling 1.6x overall speedup 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree

Slide 64

Slide 64 text

Workload #1 Workload #2 Real workloads 1000 queries each bottom-up qd-tree 0 100 200 300 Runtime (mins) 322 64 bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10

Slide 65

Slide 65 text

Workload #1 Workload #2 Real workloads 1000 queries each bottom-up qd-tree 0 100 200 300 Runtime (mins) 322 64 Higher speedups on workloads 
 with very low selectivities bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10 14x 5x

Slide 66

Slide 66 text

0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext

Slide 67

Slide 67 text

Most improvements learned in ~10 minutes 0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext

Slide 68

Slide 68 text

Most improvements learned in ~10 minutes 0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext First trees already better than baselines! (bottom-up; range)

Slide 69

Slide 69 text

Qd-trees are interpretable

Slide 70

Slide 70 text

Qd-trees are interpretable Tree Depth

Slide 71

Slide 71 text

Qd-trees are interpretable Number of cuts Tree Depth

Slide 72

Slide 72 text

Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95) l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth

Slide 73

Slide 73 text

Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95) l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth

Slide 74

Slide 74 text

Qd-trees are interpretable p_container IN (LG CASE, LG ...)

Slide 75

Slide 75 text

Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey = s_nationkey c_nationkey = s_nationkey Y N

Slide 76

Slide 76 text

Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01

Slide 77

Slide 77 text

Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01 Use RL to search for useful data structures: inspect & deploy as a white box

Slide 78

Slide 78 text

Learned layouts for big data analytics deep RL minimizes I/O Qd-tree

Slide 79

Slide 79 text

Learned layouts for big data analytics deep RL minimizes I/O Qd-tree Qd-tree Format Parquet CSV Custom ✓ Format-agnostic

Slide 80

Slide 80 text

Learned layouts for big data analytics deep RL minimizes I/O Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic 0 lines added to engines

Slide 81

Slide 81 text

Learned layouts for big data analytics deep RL minimizes I/O Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines

Slide 82

Slide 82 text

Learned layouts for big data analytics deep RL minimizes I/O Qd-tree zongheng.me/qdtree aka.ms/SimpleStore Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines Thank you!