Slide 1

Slide 1 text

Balsa: Learning a Query Optimizer Without Expert Demonstrations Zongheng Yang, Wei-Lin Chiang*, Frank Luan*, Gautam Mittal, Michael Luo, Ion Stoica SIGMOD 2022 bit.ly/balsa-pdf github.com/balsa-project

Slide 2

Slide 2 text

SELECT ... FROM A, B, C WHERE A.age > 21 AND C.city IN (SFO,HNL) Optimizers are essential for data systems Query Optimizer Execution Plan C Hash Join Loop Join A B 2

Slide 3

Slide 3 text

SELECT ... FROM A, B, C WHERE A.age > 21 AND C.city IN (SFO,HNL) Optimizers are essential for data systems Query Optimizer Execution Plan C Hash Join Loop Join A B Critical for performance: pick join orders, physical operators Widely applicable: used by many kinds of data systems 2

Slide 4

Slide 4 text

Optimizers take years to develop First optimizer before 2000s; still seeing changes to date Built cost-based optimizer 3 years after first release Shipped first optimizer by a team after “9 months of intense effort” [1] [1] https://www.cockroachlabs.com/blog/building-cost-based-sql-optimizer

Slide 5

Slide 5 text

New systems: even higher dev cost New computation models, optimization objectives, or engines: no expert optimizer to build off from.

Slide 6

Slide 6 text

Can we learn to optimize queries without learning from an expert optimizer?

Slide 7

Slide 7 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa

Slide 8

Slide 8 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query

Slide 9

Slide 9 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query Plan

Slide 10

Slide 10 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query Plan Latency

Slide 11

Slide 11 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query Plan Latency Update policy

Slide 12

Slide 12 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query Plan Latency Update policy Without learning from an expert system, Balsa can outperform open-source & commercial optimizers

Slide 13

Slide 13 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Learned Optimizer Agent Balsa Query Plan Latency Update policy 7

Slide 14

Slide 14 text

Planning: value network Value: (query, partial plan) → final latency Query (tables to join) } A B C { 8

Slide 15

Slide 15 text

Planning: value network Loop Join A B Hash Join C B Value: (query, partial plan) → final latency Query (tables to join) } A B C { pick join order, physical op 8

Slide 16

Slide 16 text

Planning: value network “Leads to a final plan that runs in time T” Loop Join A B Value ( ) Hash Join C B Value ( ) Analogy: final win prob. Value: (query, partial plan) → final latency Query (tables to join) } A B C { pick join order, physical op 10 secs 1 min. 8

Slide 17

Slide 17 text

Value network & tree search Loop Join A B Value ( ) Hash Join C B Value ( ) Value: (query, partial plan) → final latency Query (tables to join) } A B C { Run tree search to get a complete plan ML advance (AlphaGo) Used in optimizing Halide, theorem proving, etc. 10 secs 1 min. 9 pick join order, physical op

Slide 18

Slide 18 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Balsa Query Plan Latency Learned Value Network Tree Search train loop: plan() execute() update() 10

Slide 19

Slide 19 text

Balsa: a Learned Query Optimizer Environment Training Workload Execution Engine Balsa Query Plan Latency Learned Value Network Tree Search train loop: plan() execute() update() Must execute plans to get rewards No expert data → can easily run into bad plans, which are slow 11

Slide 20

Slide 20 text

Key challenge: Slow plans are everywhere! Expensive plans Good plans 1-hour plan 1-sec plan 5-sec plan Plan space for a given query (schematic*) 12 * Based on our experiments on Join Order Benchmark, and "How Good Are Query Optimizers, Really?”, Leis et al., VLDB’15. 1-day plan

Slide 21

Slide 21 text

Key challenge: Slow plans are everywhere! Expensive plans Good plans 1-hour plan 1-sec plan 5-sec plan Agent actions ? ? ? ? ? Plan space for a given query (schematic*) 12 * Based on our experiments on Join Order Benchmark, and "How Good Are Query Optimizers, Really?”, Leis et al., VLDB’15. 1-day plan

Slide 22

Slide 22 text

Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value Network Tree Search Query Plan Latency 13 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs

Slide 23

Slide 23 text

Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value Network Tree Search 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs

Slide 24

Slide 24 text

Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value Network Tree Search Simulation Environment Training Workload Two-liner Cost Model Query Plan Cost 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs

Slide 25

Slide 25 text

Sim-to-real learning Environment Training Workload Execution Engine Balsa Learned Value Network Tree Search Simulation Environment Training Workload Two-liner Cost Model Query Plan Cost Simulation enables the agent to efficiently learn to avoid the most disastrous (high-cost) plans. 14 Pretrain in a minimal, logical-only “simulator” Instead of executing plans, quickly estimate their costs

Slide 26

Slide 26 text

Simulator: a minimal cost model 15 Cout(T) = ( |T| B7 T Bb i#H2fb2H2+iBQM |T| + Cout(T1) + Cout(T2) B7 T = T1 ./ T2 AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw= AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw= AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw= AAACfnicbVFNb9NAEF2br2K+Ahy5DCRUDRWtHSHRS6WKXjgWKWkrxZG13ozTVde71u4YiFz/DP4YN34LFzaphUjLSCu9fTPzZvdNXinpKI5/BeGdu/fuP9h6GD16/OTps97zF6fO1FbgRBhl7HnOHSqpcUKSFJ5XFnmZKzzLL49X+bOvaJ00ekzLCmclX2hZSMHJU1nvR3ScNaamdmc8hMMozXEhdSO8omujq/EVbKeE36mRBQzGA5AOOBD36vt+JoqVSAtpui7dhb9aWTLcuI6GG0JwCL4E0tx8I4kejwZtlKKed5OzXj/ei9cBt0HSgT7r4iTr/UznRtQlahKKOzdN4opmDbckhUKvXTusuLjkC5x6qHmJbtas7WvhrWfmUBjrjyZYs/92NLx0blnmvrLkdOFu5lbk/3LTmoqDWSN1VRNqcT2oqBWQgdUuYC6tt1AtPeDCSv9WEBfcckF+Y5E3Ibn55dvgdLSXePzlQ//oU2fHFnvF3rAdlrCP7Ih9ZidswgT7HbwO3gW7IQu3w/fh/nVpGHQ9L9lGhAd/ANtVvBw= Cout [1]: cost = sum(estimated cardinalities of all operators) ✓ generic (logical-only): no assumption on physical environment ✓ correlates with execution speed [1] “On the complexity of generating optimal left-deep processing trees with cross products”, Cluet and Moerkotte, ICDT’95.

Slide 27

Slide 27 text

Safety mechanisms Executing a suboptimal plan of a training query In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans

Slide 28

Slide 28 text

Safety mechanisms Executing a suboptimal plan of a training query In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans Safe execution: time out & set its unknown latency to LARGE_NUM

Slide 29

Slide 29 text

Safety mechanisms Executing a suboptimal plan of a training query In real execution, two ways to hit slow plans 16 Exploration: occasionally execute unfamiliar plans Safe execution: time out & set its unknown latency to LARGE_NUM Safe exploration: only explore “probably good” plans Use k-best plans from tree search

Slide 30

Slide 30 text

Balsa: the complete architecture Balsa Learned Value Network Tree Search Minimal Simulator Safely Execute, Explore 17

Slide 31

Slide 31 text

Bonus: Diversifying experiences 18 Exp. Buffer Trained Agent 1 Exp. Buffer Trained Agent N Observation: agent experiences differ a lot Ex: Some prefers loop join, vs. merge join

Slide 32

Slide 32 text

Bonus: Diversifying experiences 18 Exp. Buffer Trained Agent 1 Exp. Buffer Trained Agent N Merged experiences Diversified agent Offline training Observation: agent experiences differ a lot Ex: Some prefers loop join, vs. merge join Idea: diversified experiences Offline retrain an agent on it Significantly enhances both training & unseen test query performance (see paper)

Slide 33

Slide 33 text

Prior art, ML for QO Learned optimizers DQ [Krishnan et al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders

Slide 34

Slide 34 text

Prior art, ML for QO Learned optimizers DQ [Krishnan et al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders For the first time, Balsa removes the need to learn from expert optimizers.

Slide 35

Slide 35 text

Prior art, ML for QO Learned optimizers DQ [Krishnan et al.] Learns from expert cost models Neo [VLDB ’19] Learns from expert plans (demonstrations) 19 Optimizer assistant Bao [SIGMOD ’21] Doesn’t produce plans; sets flags for expert optimizer Intra-query learning: SkinnerDB [SIGMOD ’19] During a query’s execution, explores many join orders Limited to picking left-deep join orders For the first time, Balsa removes the need to learn from expert optimizers. Balsa learns to produce plans with physical operators & bushy join orders.

Slide 36

Slide 36 text

Workload speedup PostgreSQL v12 20

Slide 37

Slide 37 text

Workload speedup PostgreSQL v12 Workload Speedup JOB JOB-Slow TPC-H train test 1x 20

Slide 38

Slide 38 text

Workload speedup PostgreSQL v12 Workload Speedup JOB JOB-Slow TPC-H train test 1.2 1.3 1.7 1.1 1.3 2.1 1x 20

Slide 39

Slide 39 text

Workload speedup PostgreSQL v12 Commercial DBMS Workload Speedup JOB JOB-Slow TPC-H train test 1.2 1.3 1.7 1.1 1.3 2.1 Workload Speedup JOB JOB-Slow TPC-H 1.0 1.5 1.9 1.1 2.4 2.8 train test 1x 1x Without learning from expert optimizers, Balsa outperforms them on both train (up to 2.8x) & unseen (up to 1.9x) queries 20

Slide 40

Slide 40 text

Training Balsa is efficient 0 5 10 15 Elasped Time (hours) 20 10 5 1 0.5 Normalized Runtime [log] JOB JOB, Slow TPC-H Faster than expert plans Slower than expert plans

Slide 41

Slide 41 text

Training Balsa is efficient Training Balsa is efficient: a few hours to nightly maintenance range 0 5 10 15 Elasped Time (hours) 20 10 5 1 0.5 Normalized Runtime [log] JOB JOB, Slow TPC-H Faster than expert plans Slower than expert plans

Slide 42

Slide 42 text

Balsa vs. Expert Demonstrations (Neo) Test set performance 22 0 2 4 6 8 10 Elasped Time (hours) 50 10 5 1 0.5 Normalized Runtime [log] Neo-impl Balsa

Slide 43

Slide 43 text

Balsa vs. Expert Demonstrations (Neo) Test set performance 22 0 2 4 6 8 10 Elasped Time (hours) 50 10 5 1 0.5 Normalized Runtime [log] Neo-impl Balsa Balsa significantly improves generalization to unseen queries (more robust; can surpass expert) Failed to generalize to unseen queries without simulation learning

Slide 44

Slide 44 text

Balsa Learn to optimize, without learning from expert optimizers Key challenge: mitigate slow plans during learning Paper: more ideas & evaluation Open sourced on GitHub For researchers: reproduce & build on For users: find best plans for your queries/hardware {zongheng, weichiang, lsf, gbm, michael.luo, istoica} @berkeley.edu bit.ly/balsa-pdf github.com/balsa-project