$30 off During Our Annual Pro Sale. View Details »

Qd-tree: Learning Data Layouts for Big Data Analytics

Qd-tree: Learning Data Layouts for Big Data Analytics

SIGMOD 2020
Project page: https://zongheng.me/qdtree
Talk video: https://youtu.be/KQ5vFyvinQ4
Paper: https://zongheng.me/pubs/qdtree-sigmod20.pdf

We use neural networks to generate data layouts that can speed up big data analytics by more than 10x.

Presenter: Zongheng Yang (https://zongheng.me), @zongheng_yang.

Zongheng Yang

June 17, 2020
Tweet

More Decks by Zongheng Yang

Other Decks in Research

Transcript

  1. Qd-tree: Learning Data Layouts
    for Big Data Analytics
    Zongheng Yang (UC Berkeley), Badrish Chandramouli, Chi Wang,
    Johannes Gehrke, Yinan Li , Umar Minhas, Per-Åke Larson,
    Donald Kossmann (Microsoft Research), Rajeev Acharya (Microsoft)
    SIGMOD 2020

    View Slide

  2. Layouts are critical for performance
    Block
    Block
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Range partition on timestamp
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%

    View Slide

  3. Layouts are critical for performance
    Block
    Block
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Query
    usr = root
    Range partition on timestamp
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%

    View Slide

  4. Layouts are critical for performance
    Block
    Block
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Query
    usr = root
    Range partition on timestamp
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%

    View Slide

  5. Layouts are critical for performance
    Block
    Block
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Query
    usr = root
    Range partition on timestamp
    Read
    Read
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%

    View Slide

  6. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  7. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    0 root 5%
    2 root 95%
    1 guest 9%
    3 guest 99%
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  8. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    Read
    Skipped!
    0 root 5%
    2 root 95%
    1 guest 9%
    3 guest 99%
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  9. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    Query
    cpu < 10%
    0 root 5%
    2 root 95%
    1 guest 9%
    3 guest 99%
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  10. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    Query
    cpu < 10%
    Read
    Read
    0 root 5%
    2 root 95%
    1 guest 9%
    3 guest 99%
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  11. Layouts are critical for performance
    Block
    Block
    Query
    usr = root
    Query
    cpu < 10%
    Read
    Read
    0 root 5%
    2 root 95%
    1 guest 9%
    3 guest 99%
    As workloads grow, heuristic layouts
    become increasingly suboptimal.
    timestamp usr cpu
    0 root 5%
    1 guest 9%
    2 root 95%
    3 guest 99%
    Hash partition on usr

    View Slide

  12. Problem statement
    Raw
    Data
    Query
    Log
    Laid-out Data
    Given a table & a set of queries,
    partition data to maximize
    the total # of skipped records

    View Slide

  13. Qd-tree: query-data routing tree
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N

    View Slide

  14. Qd-tree: query-data routing tree
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Cuts

    View Slide

  15. Qd-tree: query-data routing tree
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Cuts
    Data blocks

    View Slide

  16. Qd-tree: query-data routing tree
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Cuts
    Data blocks
    Learn qd-trees with excellent skipping
    → use them to produce data layouts
    Key Idea

    View Slide

  17. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N

    View Slide

  18. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’

    View Slide

  19. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root

    View Slide

  20. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to

    View Slide

  21. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to

    View Slide

  22. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to

    View Slide

  23. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to
    3. Append to block

    View Slide

  24. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to
    3. Append to block
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Record

    View Slide

  25. Routing data
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Q
    R
    1. Arrive at root
    2. Evaluate cut, go down
    the branch it belongs to
    3. Append to block
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R
    Q
    R ✓ Format-agnostic
    Serialize in any format
    Record
    cpu = 10
    mem = 1G
    usr = ‘guest’
    Record

    View Slide

  26. Routing queries: which blocks to read
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Query
    mem > 1G AND
    usr = ‘guest’
    Q
    Q
    1. Arrive at root
    2. Evaluate cut, go down
    intersecting branch(es)
    Query & nodes: hyper-rectangles

    View Slide

  27. Routing queries: which blocks to read
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Query
    mem > 1G AND
    usr = ‘guest’
    Q Q
    1. Arrive at root
    2. Evaluate cut, go down
    intersecting branch(es)
    Query & nodes: hyper-rectangles

    View Slide

  28. Routing queries: which blocks to read
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Query
    mem > 1G AND
    usr = ‘guest’
    Q
    Q
    1. Arrive at root
    2. Evaluate cut, go down
    intersecting branch(es)
    3. Output block IDs to read
    (skip other blocks)
    Query & nodes: hyper-rectangles

    View Slide

  29. Routing queries: which blocks to read
    cpu<90?
    usr=‘root’?
    mem>1G?
    B0 B1
    B2
    B3
    Y N
    Query
    mem > 1G AND
    usr = ‘guest’
    Q
    Q
    1. Arrive at root
    2. Evaluate cut, go down
    intersecting branch(es)
    3. Output block IDs to read
    (skip other blocks)
    Query & nodes: hyper-rectangles
    ✓ Run on any engine

    View Slide

  30. qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase

    View Slide

  31. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  32. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Finding the optimal
    qd-tree is NP-hard
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  33. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Finding the optimal
    qd-tree is NP-hard
    Dynamic programming?
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  34. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Finding the optimal
    qd-tree is NP-hard
    Dynamic programming?
    • Can’t scale to large search spaces
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  35. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Finding the optimal
    qd-tree is NP-hard
    Dynamic programming?
    • Can’t scale to large search spaces
    Greedy — see paper
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  36. Black-box
    Query Engine
    qd-tree
    Laid-out Data
    (Parquet/CSV)
    Layout Phase
    Read
    Querying Phase
    Finding the optimal
    qd-tree is NP-hard
    Dynamic programming?
    • Can’t scale to large search spaces
    Deep reinforcement learning (RL)
    • Approximate DP
    Greedy — see paper
    Query +
    block_id IN
    (1,3,9,..)

    View Slide

  37. Learning qd-trees
    RL agent
    Builds trees

    View Slide

  38. Learning qd-trees
    RL agent
    Builds trees
    Repeat: Build a tree, evaluate quality,
    learn good/bad cuts.

    View Slide

  39. Learning qd-trees
    extract cuts
    sample
    RL agent
    Builds trees
    Raw
    Data
    Query
    Log
    Repeat: Build a tree, evaluate quality,
    learn good/bad cuts.

    View Slide

  40. Learning qd-trees
    extract cuts
    sample
    RL agent
    Builds trees
    Raw
    Data
    Query
    Log
    Best
    qd-tree
    Repeat: Build a tree, evaluate quality,
    learn good/bad cuts.

    View Slide

  41. RL agent
    Builds trees

    View Slide

  42. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    RL agent
    Builds trees

    View Slide

  43. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    RL agent
    Builds trees
    [cpu=[0,90), mem=(1G,maxMem), usr=*]

    View Slide

  44. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    RL agent
    Builds trees

    View Slide

  45. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    Actions cuts — simply all filters from queries
    RL agent
    Builds trees

    View Slide

  46. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    Actions cuts — simply all filters from queries
    Candidate Cuts
    1. usr = ‘root’ AND cpu < 90
    2. mem > 1G
    ...
    Query Log
    RL agent
    Builds trees

    View Slide

  47. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    Actions cuts — simply all filters from queries
    Candidate Cuts
    1. usr = ‘root’ AND cpu < 90
    2. mem > 1G
    ...
    Query Log
    1. usr = ‘root’
    RL agent
    Builds trees

    View Slide

  48. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    Actions cuts — simply all filters from queries
    Candidate Cuts
    1. usr = ‘root’ AND cpu < 90
    2. mem > 1G
    ...
    Query Log
    1. usr = ‘root’
    2. cpu < 90
    RL agent
    Builds trees

    View Slide

  49. cpu<90?
    mem>1G?
    Block
    Block
    Y N
    States nodes — subspace description
    Actions cuts — simply all filters from queries
    Candidate Cuts
    1. usr = ‘root’ AND cpu < 90
    2. mem > 1G
    ...
    Query Log
    1. usr = ‘root’
    2. cpu < 90
    3. mem > 1G
    RL agent
    Builds trees

    View Slide

  50. Step
    RL agent
    Builds trees

    View Slide

  51. ?
    Step
    RL agent
    Builds trees
    1

    View Slide

  52. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Step
    RL agent
    Builds trees
    Policy
    Net
    1

    View Slide

  53. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Step
    RL agent
    Builds trees Sample a cut
    Policy
    Net
    1

    View Slide

  54. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Initially, ~uniform;
    improved over time
    Step
    RL agent
    Builds trees Sample a cut
    Policy
    Net
    1

    View Slide

  55. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    ? ?
    Step
    RL agent
    Builds trees
    Policy
    Net
    1
    2

    View Slide

  56. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    ? ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Step
    RL agent
    Builds trees
    Policy
    Net
    Policy
    Net
    1
    2

    View Slide

  57. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    ? ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    Cut1 B2
    B0 B1
    Step
    RL agent
    Builds trees
    Policy
    Net
    Policy
    Net
    1
    2
    T

    View Slide

  58. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    ? ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    Cut1 B2
    B0 B1
    Step
    RL agent
    Builds trees
    Policy
    Net
    Policy
    Net
    Compute reward: # skipped records
    ✓ No query execution; only intersection checks
    1
    2
    T

    View Slide

  59. ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    ? ?
    Prob.
    Cut0 Cut1 Cut2 Cut3
    Cut2
    Cut1 B2
    B0 B1
    Step
    RL agent
    Builds trees
    Policy
    Net
    Policy
    Net
    Compute reward: # skipped records
    ✓ No query execution; only intersection checks
    Bump probabilities of good cuts
    → gradually learn to build better trees
    1
    2
    T

    View Slide

  60. Denormalized TPC-H, 1 month (85GB)
    10 queries/template

    View Slide

  61. Denormalized TPC-H, 1 month (85GB)
    10 queries/template
    1 3 4 5 6 7 8 9 10 12 14 17 18 19 21
    Template
    0
    10
    20
    30
    40
    Runtime (s)

    View Slide

  62. Denormalized TPC-H, 1 month (85GB)
    10 queries/template
    1 3 4 5 6 7 8 9 10 12 14 17 18 19 21
    Template
    0
    10
    20
    30
    40
    Runtime (s)
    bottom-up
    qd-tree
    [SIGMOD’15] Sun et al., Fine-grained Partitioning for Aggressive Data Skipping
    Based on frequent filter mining

    View Slide

  63. Denormalized TPC-H, 1 month (85GB)
    10 queries/template
    Qd-tree skips 45% more records,
    enabling 1.6x overall speedup
    1 3 4 5 6 7 8 9 10 12 14 17 18 19 21
    Template
    0
    10
    20
    30
    40
    Runtime (s)
    bottom-up
    qd-tree

    View Slide

  64. Workload #1 Workload #2
    Real workloads
    1000 queries each
    bottom-up qd-tree
    0
    100
    200
    300
    Runtime (mins)
    322
    64
    bottom-up qd-tree
    0
    50
    100
    150
    Runtime (mins)
    148
    10

    View Slide

  65. Workload #1 Workload #2
    Real workloads
    1000 queries each
    bottom-up qd-tree
    0
    100
    200
    300
    Runtime (mins)
    322
    64
    Higher speedups on workloads 

    with very low selectivities
    bottom-up qd-tree
    0
    50
    100
    150
    Runtime (mins)
    148
    10
    14x 5x

    View Slide

  66. 0 1000 2000
    25%
    30%
    35%
    40%
    Scan Ratio
    TPC-H
    0 1000 2000
    0.15%
    0.20%
    0.25%
    0.30%
    Elasped Time (sec)
    ErrorLog-Ext

    View Slide

  67. Most improvements learned in ~10 minutes
    0 1000 2000
    25%
    30%
    35%
    40%
    Scan Ratio
    TPC-H
    0 1000 2000
    0.15%
    0.20%
    0.25%
    0.30%
    Elasped Time (sec)
    ErrorLog-Ext

    View Slide

  68. Most improvements learned in ~10 minutes
    0 1000 2000
    25%
    30%
    35%
    40%
    Scan Ratio
    TPC-H
    0 1000 2000
    0.15%
    0.20%
    0.25%
    0.30%
    Elasped Time (sec)
    ErrorLog-Ext
    First trees already better than baselines!
    (bottom-up; range)

    View Slide

  69. Qd-trees are interpretable

    View Slide

  70. Qd-trees are interpretable
    Tree Depth

    View Slide

  71. Qd-trees are interpretable
    Number of
    cuts
    Tree Depth

    View Slide

  72. Qd-trees are interpretable
    0
    10
    20
    sn_name (127)
    l_shipmode (95)
    l_quantity (75)
    l_returnflag (61)
    l_discount (57)
    AC 1 (42)
    p_container (34)
    sr_name (29)
    cn_name (20)
    p_brand (14)
    l_receiptdate (8)
    l_shipdate (5)
    AC 0 (2)
    cr_name (1)
    p_size (1)
    p_type (1)
    AC 2 (1)
    Number of
    cuts
    Tree Depth

    View Slide

  73. Qd-trees are interpretable
    0
    10
    20
    sn_name (127)
    l_shipmode (95)
    l_quantity (75)
    l_returnflag (61)
    l_discount (57)
    AC 1 (42)
    p_container (34)
    sr_name (29)
    cn_name (20)
    p_brand (14)
    l_receiptdate (8)
    l_shipdate (5)
    AC 0 (2)
    cr_name (1)
    p_size (1)
    p_type (1)
    AC 2 (1)
    Number of
    cuts
    Tree Depth

    View Slide

  74. Qd-trees are interpretable
    p_container
    IN
    (LG CASE,
    LG ...)

    View Slide

  75. Qd-trees are interpretable
    p_container
    IN
    (LG CASE,
    LG ...)
    c_nationkey =
    s_nationkey
    c_nationkey =
    s_nationkey
    Y
    N

    View Slide

  76. Qd-trees are interpretable
    p_container
    IN
    (LG CASE,
    LG ...)
    c_nationkey =
    s_nationkey
    c_nationkey =
    s_nationkey
    Y
    N
    p_brand =
    ‘Brand#43’
    l_shipdate <
    1995-01-01

    View Slide

  77. Qd-trees are interpretable
    p_container
    IN
    (LG CASE,
    LG ...)
    c_nationkey =
    s_nationkey
    c_nationkey =
    s_nationkey
    Y
    N
    p_brand =
    ‘Brand#43’
    l_shipdate <
    1995-01-01
    Use RL to search for useful data structures:
    inspect & deploy as a white box

    View Slide

  78. Learned layouts for big data analytics
    deep RL minimizes I/O
    Qd-tree

    View Slide

  79. Learned layouts for big data analytics
    deep RL minimizes I/O
    Qd-tree
    Qd-tree
    Format
    Parquet CSV Custom
    ✓ Format-agnostic

    View Slide

  80. Learned layouts for big data analytics
    deep RL minimizes I/O
    Qd-tree
    Qd-tree
    Format
    Parquet CSV Custom
    Engine
    Spark SQL Server RDBMS
    ✓ Format-agnostic
    ✓ Engine-agnostic
    0 lines added to engines

    View Slide

  81. Learned layouts for big data analytics
    deep RL minimizes I/O
    Qd-tree
    Qd-tree
    Format
    Parquet CSV Custom
    Engine
    Spark SQL Server RDBMS
    ✓ Format-agnostic
    ✓ Engine-agnostic
    ✓ Interpretable
    0 lines added to engines

    View Slide

  82. Learned layouts for big data analytics
    deep RL minimizes I/O
    Qd-tree
    zongheng.me/qdtree
    aka.ms/SimpleStore
    Qd-tree
    Format
    Parquet CSV Custom
    Engine
    Spark SQL Server RDBMS
    ✓ Format-agnostic
    ✓ Engine-agnostic
    ✓ Interpretable
    0 lines added to engines
    Thank you!

    View Slide