Pro Yearly is on sale from $80 to $50! »

Qd-tree: Learning Data Layouts for Big Data Analytics

Qd-tree: Learning Data Layouts for Big Data Analytics

SIGMOD 2020
Project page: https://zongheng.me/qdtree
Talk video: https://youtu.be/KQ5vFyvinQ4
Paper: https://zongheng.me/pubs/qdtree-sigmod20.pdf

We use neural networks to generate data layouts that can speed up big data analytics by more than 10x.

Presenter: Zongheng Yang (https://zongheng.me), @zongheng_yang.

2198891db5e2442b031b82d519a4e132?s=128

Zongheng Yang

June 17, 2020
Tweet

Transcript

  1. Qd-tree: Learning Data Layouts for Big Data Analytics Zongheng Yang

    (UC Berkeley), Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li , Umar Minhas, Per-Åke Larson, Donald Kossmann (Microsoft Research), Rajeev Acharya (Microsoft) SIGMOD 2020
  2. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  3. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  4. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  5. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp Read Read timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  6. Layouts are critical for performance Block Block Query usr =

    root timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  7. Layouts are critical for performance Block Block Query usr =

    root 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  8. Layouts are critical for performance Block Block Query usr =

    root Read Skipped! 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  9. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  10. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  11. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% As workloads grow, heuristic layouts become increasingly suboptimal. timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  12. Problem statement Raw Data Query Log Laid-out Data Given a

    table & a set of queries, partition data to maximize the total # of skipped records
  13. Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2

    B3 Y N
  14. Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2

    B3 Y N Cuts
  15. Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2

    B3 Y N Cuts Data blocks
  16. Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2

    B3 Y N Cuts Data blocks Learn qd-trees with excellent skipping → use them to produce data layouts Key Idea
  17. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N
  18. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’
  19. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root
  20. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  21. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  22. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  23. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block
  24. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R Record cpu = 10 mem = 1G usr = ‘guest’ Record
  25. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R ✓ Format-agnostic Serialize in any format Record cpu = 10 mem = 1G usr = ‘guest’ Record
  26. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles
  27. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles
  28. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles
  29. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles ✓ Run on any engine
  30. qd-tree Laid-out Data (Parquet/CSV) Layout Phase

  31. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Query + block_id IN (1,3,9,..)
  32. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Query + block_id IN (1,3,9,..)
  33. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? Query + block_id IN (1,3,9,..)
  34. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Query + block_id IN (1,3,9,..)
  35. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Greedy — see paper Query + block_id IN (1,3,9,..)
  36. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Deep reinforcement learning (RL) • Approximate DP Greedy — see paper Query + block_id IN (1,3,9,..)
  37. Learning qd-trees RL agent Builds trees

  38. Learning qd-trees RL agent Builds trees Repeat: Build a tree,

    evaluate quality, learn good/bad cuts.
  39. Learning qd-trees extract cuts sample RL agent Builds trees Raw

    Data Query Log Repeat: Build a tree, evaluate quality, learn good/bad cuts.
  40. Learning qd-trees extract cuts sample RL agent Builds trees Raw

    Data Query Log Best qd-tree Repeat: Build a tree, evaluate quality, learn good/bad cuts.
  41. RL agent Builds trees

  42. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description RL agent Builds trees
  43. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description RL agent Builds trees [cpu=[0,90), mem=(1G,maxMem), usr=*]
  44. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description RL agent Builds trees
  45. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries RL agent Builds trees
  46. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log RL agent Builds trees
  47. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ RL agent Builds trees
  48. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 RL agent Builds trees
  49. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 3. mem > 1G RL agent Builds trees
  50. Step RL agent Builds trees

  51. ? Step RL agent Builds trees 1

  52. ? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds

    trees Policy Net 1
  53. ? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds

    trees Sample a cut Policy Net 1
  54. ? Prob. Cut0 Cut1 Cut2 Cut3 Initially, ~uniform; improved over

    time Step RL agent Builds trees Sample a cut Policy Net 1
  55. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Step

    RL agent Builds trees Policy Net 1 2
  56. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Policy Net Policy Net 1 2
  57. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net 1 2 T
  58. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks 1 2 T
  59. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks Bump probabilities of good cuts → gradually learn to build better trees 1 2 T
  60. Denormalized TPC-H, 1 month (85GB) 10 queries/template

  61. Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4

    5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s)
  62. Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4

    5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree [SIGMOD’15] Sun et al., Fine-grained Partitioning for Aggressive Data Skipping Based on frequent filter mining
  63. Denormalized TPC-H, 1 month (85GB) 10 queries/template Qd-tree skips 45%

    more records, enabling 1.6x overall speedup 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree
  64. Workload #1 Workload #2 Real workloads 1000 queries each bottom-up

    qd-tree 0 100 200 300 Runtime (mins) 322 64 bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10
  65. Workload #1 Workload #2 Real workloads 1000 queries each bottom-up

    qd-tree 0 100 200 300 Runtime (mins) 322 64 Higher speedups on workloads 
 with very low selectivities bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10 14x 5x
  66. 0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H

    0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext
  67. Most improvements learned in ~10 minutes 0 1000 2000 25%

    30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext
  68. Most improvements learned in ~10 minutes 0 1000 2000 25%

    30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext First trees already better than baselines! (bottom-up; range)
  69. Qd-trees are interpretable

  70. Qd-trees are interpretable Tree Depth

  71. Qd-trees are interpretable Number of cuts Tree Depth

  72. Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95)

    l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth
  73. Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95)

    l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth
  74. Qd-trees are interpretable p_container IN (LG CASE, LG ...)

  75. Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey

    = s_nationkey c_nationkey = s_nationkey Y N
  76. Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey

    = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01
  77. Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey

    = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01 Use RL to search for useful data structures: inspect & deploy as a white box
  78. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree
  79. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom ✓ Format-agnostic
  80. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic 0 lines added to engines
  81. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines
  82. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree zongheng.me/qdtree aka.ms/SimpleStore Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines Thank you!