Upgrade to Pro — share decks privately, control downloads, hide ads and more …

query-evaluation-pwl

 query-evaluation-pwl

Presentation on Goetz Graefe's "Query Evaluation Techniques for Large Databases" paper for Papers We Love.

https://www.meetup.com/papers-we-love-too/events/237686185/

F9a2dba12b94d0c204d846a29da56bf5?s=128

Peter Geoghegan

June 15, 2017
Tweet

Transcript

  1. Query Evaluation Techniques for Large Databases PAPERS WE LOVE SF

    — JUNE 15, 2017 by Goetz Graefe
  2. https://speakerdeck.com/peterg/query-evaluation-pwl Peter 
 Geoghegan PMS COOL GRAY 11 porate color

    logo be used. use the gray logo, y be used. In produced as a blind @petervgeoghegan
  3. https://speakerdeck.com/peterg/query-evaluation-pwl INTRODUCTION 1. ARCHITECTURE OF QUERY EXECUTION 2. SORTING AND

    HASHING 2.1 Sorting 2.2 Hashing 3. DISK ACCESS 3.1 File Scans 3.2 Associative Access Using Indices 3.3 Buffer Management 4. AGGREGATION AND DUPLICATE REMOVAL 4.1 Aggregation Algorithm Based on Nested Loops 4.2 Aggregation Algorithms Based on Sorting 4.3 Aggregation Algorithms Based on Hashing 4.4 A Rough Performance Comparison 4.5 Additional Remarks on Aggregation 5. BINARY MATCHING OPERATIONS 5.1 Nested-Loops Join Algorithms 5.2 Merge-Join Algorithms 5.3 Hash Join Algorithms 5.4 Pointer-Based Joins 5.5 A Rough Performance Comparison 6. UNIVERSAL QUANTIFICATION 7. DUALITY OF SORT AND HASH-BASED QUERY PROCESSING ALGORITHMS 8. EXECUTION OF COMPLEX QUERY PLANS 9. MECHANISMS FOR PARALLEL QUERY 9.1 Parallel versus Distributed Database Systems 9.2 Forms of Parallelism 9.3 Implementation Strategies 9.4 Load Balancing and Skew 9.5 Architectures and Architecture Independence 10. PARALLEL ALGORITHMS 10.1 Parallel Selections and Updates 10.2 Parallel Sorting 10.3 Parallel Aggregation and Duplicate Removal 10.4 Parallel Joins and Other Binary Matching Operations 10.5 Parallel Universal Quantification 11. NONSTANDARD QUERY PROCESSING ALGORITHMS 11.1 Nested Relations 11.2 Temporal and Scientific Database Management 11.3 Object-oriented Database Systems 11.4 More Control Operators 12. ADDITIONAL TECHNIQUES FOR PERFORMANCE IMPROVEMENT 12.1 Precomputation and Derived Data 12.2 Data Compression 12.3 Surrogate Processing 12.4 Bit Vector Filtering 12.5 Specialized Hardware Survey paper. 86 pages! Sections can be read more or less independently, though. Similar observations made in sections we’ll skip.
  4. https://speakerdeck.com/peterg/query-evaluation-pwl INTRODUCTION 1. ARCHITECTURE OF QUERY EXECUTION 2. SORTING AND

    HASHING 2.1 Sorting 2.2 Hashing 3. DISK ACCESS 3.1 File Scans 3.2 Associative Access Using Indices 3.3 Buffer Management 4. AGGREGATION AND DUPLICATE REMOVAL 4.1 Aggregation Algorithm Based on Nested Loops 4.2 Aggregation Algorithms Based on Sorting 4.3 Aggregation Algorithms Based on Hashing 4.4 A Rough Performance Comparison 4.5 Additional Remarks on Aggregation 5. BINARY MATCHING OPERATIONS 5.1 Nested-Loops Join Algorithms 5.2 Merge-Join Algorithms 5.3 Hash Join Algorithms 5.4 Pointer-Based Joins 5.5 A Rough Performance Comparison 6. UNIVERSAL QUANTIFICATION 7. DUALITY OF SORT AND HASH-BASED QUERY PROCESSING ALGORITHMS 8. EXECUTION OF COMPLEX QUERY PLANS 9. MECHANISMS FOR PARALLEL QUERY 9.1 Parallel versus Distributed Database Systems 9.2 Forms of Parallelism 9.3 Implementation Strategies 9.4 Load Balancing and Skew 9.5 Architectures and Architecture Independence 10. PARALLEL ALGORITHMS 10.1 Parallel Selections and Updates 10.2 Parallel Sorting 10.3 Parallel Aggregation and Duplicate Removal 10.4 Parallel Joins and Other Binary Matching Operations 10.5 Parallel Universal Quantification 11. NONSTANDARD QUERY PROCESSING ALGORITHMS 11.1 Nested Relations 11.2 Temporal and Scientific Database Management 11.3 Object-oriented Database Systems 11.4 More Control Operators 12. ADDITIONAL TECHNIQUES FOR PERFORMANCE IMPROVEMENT 12.1 Precomputation and Derived Data 12.2 Data Compression 12.3 Surrogate Processing 12.4 Bit Vector Filtering 12.5 Specialized Hardware Focus of this talk {
  5. https://speakerdeck.com/peterg/query-evaluation-pwl 5 1. Preambles on Join Order, Nested Loop Join

    How nestloop is fundamentally different to hash & sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. Trends in hardware driving “Sort vs. Hash” Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, bus contention, and algorithm choice Key design decisions for parallelism. 5. Duality between Sorting and Hashing revisited Central idea of the paper, in summary. Overview
  6. https://speakerdeck.com/peterg/query-evaluation-pwl Preamble 1 — join ordering and query execution

  7. https://speakerdeck.com/peterg/query-evaluation-pwl Query Optimization Final result of query (tuples/rows) fed to

    top, answering query. User requests next tuple from root node. Tuple-at-a-time processing. Requests for next tuple cascade down query plan tree. - Not all node types can return tuple quickly (e.g., sort node must do a lot of work up-front), but most can. - Nodes don’t necessarily “give all the tuples they have to give” in the end — perhaps only what they’re asked for. 7
  8. None
  9. https://speakerdeck.com/peterg/query-evaluation-pwl Generally, biggest determinant of which plan is “cheapest” is

    the one that processes fewest tuples in order to give a correct answer. Using an index (with selective qualification) good because most data never read in the first place. Optimizer’s choice driven by statistics, from which selectivity is inferred. Join ordering key to getting good plan. Most joins can be reordered without affecting final answer (they’re commutative and associative). 9 Plan shape
  10. https://speakerdeck.com/peterg/query-evaluation-pwl Preamble 2 — what makes Nested Loop Join special

    among join algorithms 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas
  11. https://speakerdeck.com/peterg/query-evaluation-pwl 1 L 11 61 61 61 31 33 32

    2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L Preamble 2 — what makes Nested Loop Join special among join algorithms Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas
  12. https://speakerdeck.com/peterg/query-evaluation-pwl 1 L 11 61 61 61 31 33 32

    2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L Preamble 2 — what makes Nested Loop Join special among join algorithms Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas
  13. https://speakerdeck.com/peterg/query-evaluation-pwl Nested Loop Join Only join algorithm usable in principle

    to execute any join that you can express with SQL. Often index scans on both sides of join. Or, another nestloop join on outer side of top-level nestloop join. Output from outer side “plugged into” inner side. Often very fast — “lookup based”. First joined tuple returned almost immediately. 13
  14. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until ??? ???
  15. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor ??? ???
  16. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???
  17. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???
  18. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???
  19. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until 85495282 2017-05-20 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???
  20. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 coupons id valid_until 85495282 2017-05-20 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20
  21. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 coupons id valid_until ??? ???
  22. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ??? coupons id valid_until ??? ???
  23. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???
  24. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???
  25. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???
  26. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until 43957257 2018-01-01 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???
  27. https://speakerdeck.com/peterg/query-evaluation-pwl sales id customer coupon_id 2958 Condor 85495282 183 Condor

    43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor 43957257 2018-01-01 coupons id valid_until 43957257 2018-01-01
  28. https://speakerdeck.com/peterg/query-evaluation-pwl Nested Loop Join plan -- Column values are “plugged

    in” here: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)
  29. https://speakerdeck.com/peterg/query-evaluation-pwl -- Column values are “plugged in” here: EXPLAIN SELECT

    * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms) 39.47 + (7.91 * 10 rows) = 118.57 (~118.62) Nested Loop Join plan
  30. https://speakerdeck.com/peterg/query-evaluation-pwl Great OLTP performance is possible because nested loop joins

    fed by index scans (or another nested loop join) can scale wonderfully well. Nothing scales as well as selective index scans. Decent optimizer important too, to get the join ordering right consistently. Sort-merge join and Hash Join are not really “in competition” with nested loop Join — only each other. 30 Low latency queries
  31. https://speakerdeck.com/peterg/query-evaluation-pwl “The cost of this type of query plan [nested

    loop join heavy plans] grows linearly with the number of records involved, which might very well mean that the cost is effectively independent of the database size…the cost of index look-ups in traditional B-tree indexes grows logarithmically with the database size…” [1] Modern B-Tree techniques, 5.9 Nested Queries and Nested Iteration — Goetz Graefe, Modern B-Tree techniques [1]
  32. https://speakerdeck.com/peterg/query-evaluation-pwl “…The key to interactive response times, both in online

    transaction processing (OLTP) and in online analytical processing (OLAP), is to ensure that query results are fetched, not searched and computed…In relational database systems, fetching query results directly means index searches. If a result requires records from multiple indexes, index nested loops join or, more generally, nested iteration are the algorithms of choice.” — Goetz Graefe, Modern B-Tree techniques [1] [1] Modern B-Tree techniques, 5.9 Nested Queries and Nested Iteration
  33. https://speakerdeck.com/peterg/query-evaluation-pwl 33 Overview 1. Preambles on Join Order, Nested Loop

    Join How nestloop is fundamentally different to hash & sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. Trends in hardware driving “Sort vs. Hash” Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, bus contention, and algorithm choice Key design decisions for parallelism. 5. Duality between Sorting and Hashing revisited Central idea of the paper, in summary.
  34. https://speakerdeck.com/peterg/query-evaluation-pwl Duality — The symmetric relationship between Hash Join and

    Sort-Merge Join Cube frame with Octahedron dual inside Octahedron frame with Cube dual inside
  35. https://speakerdeck.com/peterg/query-evaluation-pwl Hash Joins First, build a hash table of smaller

    relation. Hash table consists of join attribute as hash key for each row, alongside row itself. Build hash table by scanning smaller table/input. Scan larger table, joining rows from smaller relation by performing lookups in hash table. Only supports equi-joins — joining on simple equality condition. 35
  36. https://speakerdeck.com/peterg/query-evaluation-pwl Hash Joins (cont.) Works particularly well when one input

    is smaller, such as a lookup table. Or, moderately large subset of table (e.g., hash table built by reading from bitmap index scan). High join selectivity works well (e.g., uncorrelated join attributes). Hash table can “overflow”, and spill “batches” to disk when insufficient main memory available. 36
  37. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  38. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  39. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  40. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  41. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  42. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  43. https://speakerdeck.com/peterg/query-evaluation-pwl cities id city_name Country 1 San Francisco USA 2

    New York USA 3 London UK Hash Table id : city_name
  44. https://speakerdeck.com/peterg/query-evaluation-pwl venues ven_id ven_name city_id 171 Dublin Hilton 6 2

    CNCC 20 1 RDS 6 10 Cube Center 6 44 SFO Hotel 1 3 uOttawa 16 9 NYC Venue 2 282 ILEC 17 4 Westfield 1 212 Lisbon Hilton 3 478 CRNA 6 54 FRN Center 4 55 Abby 3 33 Percible 4 37 UCLA 3 34 Mosconi 1 99 NYC East E 2 812 NYC North 2
  45. https://speakerdeck.com/peterg/query-evaluation-pwl venues ven_id ven_name city_id 171 Dublin Hilton 6 2

    CNCC 20 1 RDS 6 10 Cube Center 6 44 SFO Hotel 1 3 uOttawa 16 9 NYC Venue 2 282 ILEC 17 4 Westfield 1 212 Lisbon Hilton 3 478 CRNA 6 54 FRN Center 4 55 Abby 3 33 Percible 4 37 UCLA 3 34 Mosconi 1 99 NYC East E 2 812 NYC North 2 x x x x x x x x x
  46. https://speakerdeck.com/peterg/query-evaluation-pwl Hash Join plan -- Hash table built first, by

    scanning ~100 rows in tenk1 table where “unique1 < 100”: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=230.47..713.98 rows=101 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=229.20..229.20 rows=101 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)
  47. https://speakerdeck.com/peterg/query-evaluation-pwl -- Hash table built first, by scanning ~100 rows

    in tenk1 table where “unique1 < 100”: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=230.47..713.98 rows=101 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=229.20..229.20 rows=101 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms) Naturally, would-be nested loop join plan’s cost exceeds 713.98 (extrapolate from similar “WHERE t1.unique1 < 10” plan): ~39.47 + (7.91 * 100 rows) = 830.47 (approx.) Hash Join plan
  48. https://speakerdeck.com/peterg/query-evaluation-pwl Merge Joins Sorted input to Merge Join node on

    both sides of join. Actual sort node typically appears for at least one side of join. Often both sides. Input might instead come from an index scan. B- Trees store items in the natural sort order of columns that are indexed. Sorts can, in general, spill to disk if there is much data to sort. 48
  49. https://speakerdeck.com/peterg/query-evaluation-pwl conferences conf_id conf_name ven_id 23 Systems We Love 4

    12 Surge 10 13 PGConf.US 9 24 pgCon 3 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 44 All Your Base 282 15 VLDB2016 18 94 IEEE ICDE 48 25 Velocity 32 59 Percona Live 17 venues ven_id ven_name 171 Dublin Hilton 2 CNCC 1 RDS 10 Cube Center 44 SFO Hotel 3 uOttawa 9 NYC Venue 282 ILEC 4 Westfield
  50. https://speakerdeck.com/peterg/query-evaluation-pwl conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems

    We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 171 Dublin Hilton 2 CNCC 1 RDS 10 Cube Center 44 SFO Hotel 3 uOttawa 9 NYC Venue 282 ILEC 4 Westfield
  51. https://speakerdeck.com/peterg/query-evaluation-pwl conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems

    We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  52. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name conferences conf_id conf_name

    ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  53. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name conferences conf_id conf_name

    ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  54. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name conferences conf_id conf_name

    ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  55. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name conferences conf_id conf_name

    ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  56. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name 24 pgCon 3

    uOttawa conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  57. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name 24 pgCon 3

    uOttawa conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  58. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name 24 pgCon 3

    uOttawa conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  59. https://speakerdeck.com/peterg/query-evaluation-pwl Join output conf_id conf_name ven_id ven_name 24 pgCon 3

    uOttawa 23 Systems We Love 4 Westfield conferences conf_id conf_name ven_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 venues ven_id ven_name 1 RDS 2 CNCC 3 uOttawa 4 Westfield 9 NYC Venue 10 Cube Center 44 SFO Hotel 171 Dublin Hilton 282 ILEC
  60. https://speakerdeck.com/peterg/query-evaluation-pwl Merge Joins (cont.) More flexible than hash joins in

    some ways — not limited to equi-joins. Can also be interval-based, for example. Sorts can, in general, spill to disk if there is much data to sort. External sorts usually still fast enough. 60
  61. https://speakerdeck.com/peterg/query-evaluation-pwl Merge Join plan -- N.B.: t2 is from one

    table, not tench table. -- -- Tuples “sucked up” from each side, in sorted order: EXPLAIN SELECT * FROM tenk1 t1, onek t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Merge Join (cost=198.11..268.19 rows=10 width=488) Merge Cond: (t1.unique2 = t2.unique2) -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) Filter: (unique1 < 100) -> Sort (cost=197.83..200.33 rows=1000 width=244) Sort Key: t2.unique2 -> Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)
  62. https://speakerdeck.com/peterg/query-evaluation-pwl 62 Overview 1. Preambles on Join Order, Nested Loop

    Join How nestloop is fundamentally different to hash & sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. Trends in hardware driving “Sort vs. Hash” Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, bus contention, and algorithm choice Key design decisions for parallelism. 5. Duality between Sorting and Hashing revisited Central idea of the paper, in summary.
  63. https://speakerdeck.com/peterg/query-evaluation-pwl Trends in hardware driving “Sort vs. Hash” Pictured: DEC

    Alpha AXP 21064 CPU
  64. https://speakerdeck.com/peterg/query-evaluation-pwl Replacement Selection: Merge Scheduling ❖ Knuth dedicated a special

    pull-out section of his book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives Prehistory: Merge Scheduling, and early importance of Sort-merge join
  65. https://speakerdeck.com/peterg/query-evaluation-pwl ❖ Knuth dedicated a special pull-out section of his

    book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives “Readers will benefit most from this material by transplanting themselves temporarily into the mindset of the 1970s. Let us therefore pretend that we still live in that bygone era.” — Donald Knuth, The Art of Computer Programming: Volume 3: Sorting and Searching (2nd Edition,1998)
  66. https://speakerdeck.com/peterg/query-evaluation-pwl ❖ Knuth dedicated a special pull-out section of his

    book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives • Limited number of tape drives — use of actual magnetic tape decks assumed! • These days, we still almost always do one merge pass. Polyphase techniques of minimal practical value, since in practice we should be able to fit at least one record from each run in memory, for heap that merges (to produce final sorted output). • Every time main memory size doubles, our capacity to have an external sort complete with only one pass quadruples. • Capacity doubled many times since early 1970s! • External sorting often still necessary today — data sizes have grown, too.
  67. https://speakerdeck.com/peterg/query-evaluation-pwl Trends in hardware make hash join compelling Introduction of

    hash joins in major systems in late 1980s and early 1990s largely response to trends in CPU architecture. In particular, the huge increase in main memory sizes. 64-bit architectures become available in high end commodity servers. “Hybrid” hash joins have plausible spill mechanism. It now matters far less that there is no such thing as “polyphase hashing”. Sort-merge join loses a significant advantage. 67
  68. https://speakerdeck.com/peterg/query-evaluation-pwl Advancements in CPU architecture help sorting, too 64-bit processors

    become fairly mainstream for server workloads in early 1990s. Memory bandwidth and memory latency bottleneck. CPU caches added to hide latency. 68
  69. https://speakerdeck.com/peterg/query-evaluation-pwl Figure 2. Replacement-selection sort vs. QuickSort °.. .... ~.~_~.~

    .~o,.%°~o,. The tournament tree of replacement-selection sort at left has bad cache behavior, unless the entire tournament fits in cache. The diagram at left shows the memory references as a winner is removed, and a new element is added to the tournament. Each traversal of the tree has many cache misses at the leaves of the tree. By contrast, the QuickSort diagrammed on the right fits entirely in the on-board cache, and partially in the on-chip cache. [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)
  70. https://speakerdeck.com/peterg/query-evaluation-pwl Advancements in CPU architecture help sorting, too Using memory

    capacity as efficiently as possible no longer first concern. Rather, CPU cache efficiency becomes first concern. Forget elegance! Focus on cache misses, and to a lesser extent instruction count and branch prediction. 70 [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)
  71. https://speakerdeck.com/peterg/query-evaluation-pwl Figure 1. How far away is the data? l

    109 Tape/Optical ~ ° . m ~ ~ a cn Robot 10 6 Disk O C .E I-- 100 Memory ~, 10 On Board C ..... ~ This Campus Eo 2 On Chip Cache ]his Room O 1 Registers ~ y My Head 2,000 Years i 2 Years .g 1.5hr ~ Z 10 min 1 min A whimsical analogy between computer time and human time, as seen from San Francisco. The scale on the left shows the number of processor cycles needed to get to various levels of the memory hierarchy (measured in 5 ns. processor clock ticks). The scale on the right is a more human scale, showing time based in human units (minutes). Suppose that AlphaSort paid no attention to the cache, and that it randomly [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)
  72. https://speakerdeck.com/peterg/query-evaluation-pwl 72 Overview 1. Preambles on Join Order, Nested Loop

    Join How nestloop is fundamentally different to hash & sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. Trends in hardware driving “Sort vs. Hash” Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, bus contention, and algorithm choice Key design decisions for parallelism. 5. Duality between Sorting and Hashing revisited Central idea of the paper, in summary.
  73. https://speakerdeck.com/peterg/query-evaluation-pwl Parallelism, bus contention, and algorithm choice Image: STREAM memory

    bandwidth benchmarking tool
  74. https://speakerdeck.com/peterg/query-evaluation-pwl Parallelism Multiple CPUs access shared memory in parallel machines.

    Parallelize serial algorithms to increase throughput/performance. “Bus contention” (memory bandwidth saturation) is a major bottleneck, even in 1994. This is a key design concern for parallelism. This general problem has further intensified in the years since Graefe’s paper was published. 74
  75. https://speakerdeck.com/peterg/query-evaluation-pwl Parallel join Inter-core transfer is inherently expensive, and must

    be minimized — locality matters. Push as much work down to each CPU as possible. Eliminate input before transfer occurs. “Sort vs. Hash” issues mostly about locality, too. More cores active, less memory per core (favors sort). Small hash table on inner side can easily be shared (favors hash). Hash table may be very small. 75
  76. https://speakerdeck.com/peterg/query-evaluation-pwl 76 Overview 1. Preambles on Join Order, Nested Loop

    Join How nestloop is fundamentally different to hash & sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. Trends in hardware driving “Sort vs. Hash” Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, bus contention, and algorithm choice Key design decisions for parallelism. 5. Duality between Sorting and Hashing revisited Central idea of the paper, in summary.
  77. https://speakerdeck.com/peterg/query-evaluation-pwl Icosahedron frame with Dodecahedron dual inside Dodecahedron frame with

    Icosahedron dual inside Duality between Sorting and Hashing revisited
  78. https://speakerdeck.com/peterg/query-evaluation-pwl Duality between Sorting and Hashing revisited: Symmetry Sort: Physical

    Division Logical Combination Hash: Logical Division Physical Combination
  79. https://speakerdeck.com/peterg/query-evaluation-pwl Merge Join’s “combination” Sorting is cumulative — input can

    be sorted gradually, increasing locality as we go. This is much less important when dataset already naturally allows locality (e.g., Zipfian distribution). Faster with equal sized inputs, especially where tuples on both sides are typically actually output (matched/joined) in the end. Sort order can be reused (e.g., for ORDER BY). 79
  80. https://speakerdeck.com/peterg/query-evaluation-pwl Hash Join has one big advantage — O(1) lookup

    from hash table. Not all hash probes are truly equally expensive, though, due to memory hierarchy. Actually joining tuples often uncommon. Determining that the hash table has no match at all very fast. Slow when very memory constrained, unless input is so skewed that batches on disk are “unpopular” (i.e. it turns out that scan of bigger table mostly only needs in- memory batch of hash table anyway). 80 Hash Join’s “combination”
  81. https://speakerdeck.com/peterg/query-evaluation-pwl Summary & Outlook 81 Duality repeats at different levels

    of the memory hierarchy, and many equal-but-opposite effects are noticeable. Sense of Déjà vu, at times. Several “Sort vs. Hash” papers have been written in last 30 years. Well researched topic. “Memory is the new disk, CPU cache is the new memory” — may drive future developments.
  82. speakerdeck.com/peterg/query-evaluation-pwl Thanks!