Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Balsa: Learning a Query Optimizer Without Exper...

Balsa: Learning a Query Optimizer Without Expert Demonstrations

SIGMOD 2022
Talk video: https://www.youtube.com/watch?v=2Y88teiluGo
Paper: https://zongheng.me/pubs/balsa-sigmod2022.pdf
Code: https://github.com/balsa-project/balsa

Balsa is a reinforcement learning agent that learns to optimize SQL queries.

Abstract:

Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learn- ing from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learn- ing. Balsa first learns basic knowledge from a simple, environment- agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8× in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.

Presenter: Zongheng Yang (https://zongheng.me), @zongheng_yang.

Zongheng Yang

June 19, 2022
Tweet

More Decks by Zongheng Yang

Other Decks in Research

Transcript

  1. Balsa: Learning a Query Optimizer Without Expert Demonstrations Zongheng Yang,

    Wei-Lin Chiang*, Frank Luan*, Gautam Mittal, Michael Luo, Ion Stoica SIGMOD 2022 bit.ly/balsa-pdf github.com/balsa-project
  2. SELECT ... FROM A, B, C WHERE A.age > 21

    AND C.city IN (SFO,HNL) Optimizers are essential for data systems Query Optimizer Execution Plan C Hash Join Loop Join A B 2
  3. SELECT ... FROM A, B, C WHERE A.age > 21

    AND C.city IN (SFO,HNL) Optimizers are essential for data systems Query Optimizer Execution Plan C Hash Join Loop Join A B Critical for performance: pick join orders, physical operators Widely applicable: used by many kinds of data systems 2
  4. Optimizers take years to develop First optimizer before 2000s; still

    seeing changes to date Built cost-based optimizer 3 years after first release Shipped first optimizer by a team after “9 months of intense effort” [1] [1] https://www.cockroachlabs.com/blog/building-cost-based-sql-optimizer
  5. New systems: even higher dev cost New computation models, optimization

    objectives, or engines: no expert optimizer to build off from.
  6. Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine

    Learned Optimizer Agent Balsa Query Plan Latency Update policy
  7. Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine

    Learned Optimizer Agent Balsa Query Plan Latency Update policy Without learning from an expert system, Balsa can outperform open-source & commercial optimizers
  8. Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine

    Learned Optimizer Agent Balsa Query Plan Latency Update policy 7
  9. Planning: value network Loop Join A B Hash Join C

    B Value: (query, partial plan) → final latency Query (tables to join) } A B C { pick join order, physical op 8
  10. Planning: value network “Leads to a final plan that runs

    in time T” Loop Join A B Value ( ) Hash Join C B Value ( ) Analogy: final win prob. Value: (query, partial plan) → final latency Query (tables to join) } A B C { pick join order, physical op 10 secs 1 min. 8
  11. Value network & tree search Loop Join A B Value

    ( ) Hash Join C B Value ( ) Value: (query, partial plan) → final latency Query (tables to join) } A B C { Run tree search to get a complete plan ML advance (AlphaGo) Used in optimizing Halide, theorem proving, etc. 10 secs 1 min. 9 pick join order, physical op
  12. Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine

    Balsa Query Plan Latency Learned Value Network Tree Search train loop: plan() execute() update() 10
  13. Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine

    Balsa Query Plan Latency Learned Value Network Tree Search train loop: plan() execute() update() Must execute plans to get rewards No expert data → can easily run into bad plans, which are slow 11
  14. Key challenge: Slow plans are everywhere! Expensive plans Good plans

    1-hour plan 1-sec plan 5-sec plan Plan space for a given query (schematic*) 12 * Based on our experiments on Join Order Benchmark, and "How Good Are Query Optimizers, Really?”, Leis et al., VLDB’15. 1-day plan
  15. Key challenge: Slow plans are everywhere! Expensive plans Good plans

    1-hour plan 1-sec plan 5-sec plan Agent actions ? ? ? ? ? Plan space for a given query (schematic*) 12 * Based on our experiments on Join Order Benchmark, and "How Good Are Query Optimizers, Really?”, Leis et al., VLDB’15. 1-day plan
  16. Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value

    Network Tree Search Query Plan Latency 13 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs
  17. Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value

    Network Tree Search 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs
  18. Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value

    Network Tree Search Simulation Environment Training Workload Two-liner Cost Model Query Plan Cost 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs
  19. Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value

    Network Tree Search Simulation Environment Training Workload Two-liner Cost Model Query Plan Cost Simulation enables the agent to efficiently learn to avoid the most disastrous (high-cost) plans. 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs
  20. Simulator: a minimal cost model 15 Cout(T) = ( |T|

    B7 T Bb  i#H2fb2H2+iBQM |T| + Cout(T1) + Cout(T2) B7 T = T1 ./ T2 <latexit sha1_base64="SbjoUXtihbXDVxA/92KWZEWMFmc=">AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw=</latexit> <latexit sha1_base64="SbjoUXtihbXDVxA/92KWZEWMFmc=">AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw=</latexit> <latexit sha1_base64="SbjoUXtihbXDVxA/92KWZEWMFmc=">AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw=</latexit> <latexit sha1_base64="SbjoUXtihbXDVxA/92KWZEWMFmc=">AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw=</latexit> Cout [1]: cost = sum(estimated cardinalities of all operators) ✓ generic (logical-only): no assumption on physical environment ✓ correlates with execution speed [1] “On the complexity of generating optimal left-deep processing trees with cross products”, Cluet and Moerkotte, ICDT’95.
  21. Safety mechanisms Executing a suboptimal plan of a training query

    In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans
  22. Safety mechanisms Executing a suboptimal plan of a training query

    In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans Safe execution: time out & set its unknown latency to LARGE_NUM
  23. Safety mechanisms Executing a suboptimal plan of a training query

    In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans Safe execution: time out & set its unknown latency to LARGE_NUM Safe exploration: only explore “probably good” plans Use k-best plans from tree search
  24. Bonus: Diversifying experiences 18 Exp. Buffer Trained Agent 1 Exp.

    Buffer Trained Agent N Observation: agent experiences differ a lot Ex: Some prefers loop join, vs. merge join
  25. Bonus: Diversifying experiences 18 Exp. Buffer Trained Agent 1 Exp.

    Buffer Trained Agent N Merged experiences Diversified agent Offline training Observation: agent experiences differ a lot Ex: Some prefers loop join, vs. merge join Idea: diversified experiences Offline retrain an agent on it Significantly enhances both training & unseen test query performance (see paper)
  26. Prior art, ML for QO Learned optimizers DQ [Krishnan et

    al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders
  27. Prior art, ML for QO Learned optimizers DQ [Krishnan et

    al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders For the first time, Balsa removes the need to learn from expert optimizers.
  28. Prior art, ML for QO Learned optimizers DQ [Krishnan et

    al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders For the first time, Balsa removes the need to learn from expert optimizers. Balsa learns to produce plans with physical operators & bushy join orders.
  29. Workload speedup PostgreSQL v12 Commercial DBMS Workload Speedup JOB JOB-Slow

    TPC-H train test 1.2 1.3 1.7 1.1 1.3 2.1 Workload Speedup JOB JOB-Slow TPC-H 1.0 1.5 1.9 1.1 2.4 2.8 train test 1x 1x Without learning from expert optimizers, Balsa outperforms them on both train (up to 2.8x) & unseen (up to 1.9x) queries 20
  30. Training Balsa is efficient 0 5 10 15 Elasped Time

    (hours) 20 10 5 1 0.5 Normalized Runtime [log] JOB JOB, Slow TPC-H Faster than expert plans Slower than expert plans
  31. Training Balsa is efficient Training Balsa is efficient: a few

    hours to nightly maintenance range 0 5 10 15 Elasped Time (hours) 20 10 5 1 0.5 Normalized Runtime [log] JOB JOB, Slow TPC-H Faster than expert plans Slower than expert plans
  32. Balsa vs. Expert Demonstrations (Neo) Test set performance 22 0

    2 4 6 8 10 Elasped Time (hours) 50 10 5 1 0.5 Normalized Runtime [log] Neo-impl Balsa
  33. Balsa vs. Expert Demonstrations (Neo) Test set performance 22 0

    2 4 6 8 10 Elasped Time (hours) 50 10 5 1 0.5 Normalized Runtime [log] Neo-impl Balsa Balsa significantly improves generalization to unseen queries (more robust; can surpass expert) Failed to generalize to unseen queries without simulation learning
  34. Balsa Learn to optimize, without learning from expert optimizers Key

    challenge: mitigate slow plans during learning Paper: more ideas & evaluation Open sourced on GitHub For researchers: reproduce & build on For users: find best plans for your queries/hardware {zongheng, weichiang, lsf, gbm, michael.luo, istoica} @berkeley.edu bit.ly/balsa-pdf github.com/balsa-project