Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Qd-tree: Learning Data Layouts for Big Data Ana...

Qd-tree: Learning Data Layouts for Big Data Analytics

SIGMOD 2020
Project page: https://zongheng.me/qdtree
Talk video: https://youtu.be/KQ5vFyvinQ4
Paper: https://zongheng.me/pubs/qdtree-sigmod20.pdf

We use neural networks to generate data layouts that can speed up big data analytics by more than 10x.

Presenter: Zongheng Yang (https://zongheng.me), @zongheng_yang.

Zongheng Yang

June 17, 2020
Tweet

More Decks by Zongheng Yang

Other Decks in Research

Transcript

  1. Qd-tree: Learning Data Layouts for Big Data Analytics Zongheng Yang

    (UC Berkeley), Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li , Umar Minhas, Per-Åke Larson, Donald Kossmann (Microsoft Research), Rajeev Acharya (Microsoft) SIGMOD 2020
  2. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  3. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  4. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  5. Layouts are critical for performance Block Block 0 root 5%

    1 guest 9% 2 root 95% 3 guest 99% Query usr = root Range partition on timestamp Read Read timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99%
  6. Layouts are critical for performance Block Block Query usr =

    root timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  7. Layouts are critical for performance Block Block Query usr =

    root 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  8. Layouts are critical for performance Block Block Query usr =

    root Read Skipped! 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  9. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  10. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  11. Layouts are critical for performance Block Block Query usr =

    root Query cpu < 10% Read Read 0 root 5% 2 root 95% 1 guest 9% 3 guest 99% As workloads grow, heuristic layouts become increasingly suboptimal. timestamp usr cpu 0 root 5% 1 guest 9% 2 root 95% 3 guest 99% Hash partition on usr
  12. Problem statement Raw Data Query Log Laid-out Data Given a

    table & a set of queries, partition data to maximize the total # of skipped records
  13. Qd-tree: query-data routing tree cpu<90? usr=‘root’? mem>1G? B0 B1 B2

    B3 Y N Cuts Data blocks Learn qd-trees with excellent skipping → use them to produce data layouts Key Idea
  14. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’
  15. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root
  16. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  17. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  18. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to
  19. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block
  20. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R Record cpu = 10 mem = 1G usr = ‘guest’ Record
  21. Routing data cpu<90? usr=‘root’? mem>1G? B0 B1 B2 B3 Y

    N Record cpu = 10 mem = 1G usr = ‘guest’ Q R 1. Arrive at root 2. Evaluate cut, go down the branch it belongs to 3. Append to block Q R Q R Q R Q R Q R Q R Q R Q R Q R ✓ Format-agnostic Serialize in any format Record cpu = 10 mem = 1G usr = ‘guest’ Record
  22. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles
  23. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) Query & nodes: hyper-rectangles
  24. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles
  25. Routing queries: which blocks to read cpu<90? usr=‘root’? mem>1G? B0

    B1 B2 B3 Y N Query mem > 1G AND usr = ‘guest’ Q Q 1. Arrive at root 2. Evaluate cut, go down intersecting branch(es) 3. Output block IDs to read (skip other blocks) Query & nodes: hyper-rectangles ✓ Run on any engine
  26. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Query + block_id IN (1,3,9,..)
  27. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? Query + block_id IN (1,3,9,..)
  28. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Query + block_id IN (1,3,9,..)
  29. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Greedy — see paper Query + block_id IN (1,3,9,..)
  30. Black-box Query Engine qd-tree Laid-out Data (Parquet/CSV) Layout Phase Read

    Querying Phase Finding the optimal qd-tree is NP-hard Dynamic programming? • Can’t scale to large search spaces Deep reinforcement learning (RL) • Approximate DP Greedy — see paper Query + block_id IN (1,3,9,..)
  31. Learning qd-trees RL agent Builds trees Repeat: Build a tree,

    evaluate quality, learn good/bad cuts.
  32. Learning qd-trees extract cuts sample RL agent Builds trees Raw

    Data Query Log Repeat: Build a tree, evaluate quality, learn good/bad cuts.
  33. Learning qd-trees extract cuts sample RL agent Builds trees Raw

    Data Query Log Best qd-tree Repeat: Build a tree, evaluate quality, learn good/bad cuts.
  34. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description RL agent Builds trees [cpu=[0,90), mem=(1G,maxMem), usr=*]
  35. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries RL agent Builds trees
  36. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log RL agent Builds trees
  37. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ RL agent Builds trees
  38. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 RL agent Builds trees
  39. cpu<90? mem>1G? Block Block Y N States nodes — subspace

    description Actions cuts — simply all filters from queries Candidate Cuts 1. usr = ‘root’ AND cpu < 90 2. mem > 1G ... Query Log 1. usr = ‘root’ 2. cpu < 90 3. mem > 1G RL agent Builds trees
  40. ? Prob. Cut0 Cut1 Cut2 Cut3 Step RL agent Builds

    trees Sample a cut Policy Net 1
  41. ? Prob. Cut0 Cut1 Cut2 Cut3 Initially, ~uniform; improved over

    time Step RL agent Builds trees Sample a cut Policy Net 1
  42. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Step

    RL agent Builds trees Policy Net 1 2
  43. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Step RL agent Builds trees Policy Net Policy Net 1 2
  44. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net 1 2 T
  45. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks 1 2 T
  46. ? Prob. Cut0 Cut1 Cut2 Cut3 Cut2 ? ? Prob.

    Cut0 Cut1 Cut2 Cut3 Cut2 Cut1 B2 B0 B1 Step RL agent Builds trees Policy Net Policy Net Compute reward: # skipped records ✓ No query execution; only intersection checks Bump probabilities of good cuts → gradually learn to build better trees 1 2 T
  47. Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4

    5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s)
  48. Denormalized TPC-H, 1 month (85GB) 10 queries/template 1 3 4

    5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree [SIGMOD’15] Sun et al., Fine-grained Partitioning for Aggressive Data Skipping Based on frequent filter mining
  49. Denormalized TPC-H, 1 month (85GB) 10 queries/template Qd-tree skips 45%

    more records, enabling 1.6x overall speedup 1 3 4 5 6 7 8 9 10 12 14 17 18 19 21 Template 0 10 20 30 40 Runtime (s) bottom-up qd-tree
  50. Workload #1 Workload #2 Real workloads 1000 queries each bottom-up

    qd-tree 0 100 200 300 Runtime (mins) 322 64 bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10
  51. Workload #1 Workload #2 Real workloads 1000 queries each bottom-up

    qd-tree 0 100 200 300 Runtime (mins) 322 64 Higher speedups on workloads 
 with very low selectivities bottom-up qd-tree 0 50 100 150 Runtime (mins) 148 10 14x 5x
  52. 0 1000 2000 25% 30% 35% 40% Scan Ratio TPC-H

    0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext
  53. Most improvements learned in ~10 minutes 0 1000 2000 25%

    30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext
  54. Most improvements learned in ~10 minutes 0 1000 2000 25%

    30% 35% 40% Scan Ratio TPC-H 0 1000 2000 0.15% 0.20% 0.25% 0.30% Elasped Time (sec) ErrorLog-Ext First trees already better than baselines! (bottom-up; range)
  55. Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95)

    l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth
  56. Qd-trees are interpretable 0 10 20 sn_name (127) l_shipmode (95)

    l_quantity (75) l_returnflag (61) l_discount (57) AC 1 (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Number of cuts Tree Depth
  57. Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey

    = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01
  58. Qd-trees are interpretable p_container IN (LG CASE, LG ...) c_nationkey

    = s_nationkey c_nationkey = s_nationkey Y N p_brand = ‘Brand#43’ l_shipdate < 1995-01-01 Use RL to search for useful data structures: inspect & deploy as a white box
  59. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom ✓ Format-agnostic
  60. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic 0 lines added to engines
  61. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines
  62. Learned layouts for big data analytics deep RL minimizes I/O

    Qd-tree zongheng.me/qdtree aka.ms/SimpleStore Qd-tree Format Parquet CSV Custom Engine Spark SQL Server RDBMS ✓ Format-agnostic ✓ Engine-agnostic ✓ Interpretable 0 lines added to engines Thank you!