Slide 1

Slide 1 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Sort vs. Hash: A Duality PGConf US — March 29, 2017 Peter Geoghegan (Twitter: @petervgeoghegan) PMS COOL GRAY 11 BLACK

Slide 2

Slide 2 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Presentation Overview 1. Preamble on Nested Loop Join Why it’s very different to hash based or sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. The introduction and popularization of Hash Join Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, memory bandwidth, and the future New era, same old pressures. 5. Duality between Sorting and Hashing revisited Central idea of the presentation, in summary. 2

Slide 3

Slide 3 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas Preamble — what makes Nested Loop Join special among join algorithms 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L

Slide 4

Slide 4 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas Preamble — what makes Nested Loop Join special among join algorithms 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L

Slide 5

Slide 5 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Image generated using experimental Graphiz Postgres B-Tree visualization tool — Heikki Linnakangas Preamble — what makes Nested Loop Join special among join algorithms 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 61 37 39 31 23 L 11 61 61 61 38 32 34 24 L 11 61 61 61 38 35 37 25 L 11 61 61 61 38 39 30 26 L 11 61 61 61 39 32 33 27 L 11 61 61 61 39 35 36 28 L 11 61 61 61 39 38 39 29 L 3 1 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L

Slide 6

Slide 6 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Nested Loop Join Only join algorithm usable in principle to execute any join that you can express with SQL. Often index scans on both sides of join. Or, another nestloop join on outer side of top-level nestloop join. Output from inner side “plugged into” outer side. Often very fast — “lookup based”. 6

Slide 7

Slide 7 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until ??? ???

Slide 8

Slide 8 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor ??? ???

Slide 9

Slide 9 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???

Slide 10

Slide 10 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???

Slide 11

Slide 11 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until ??? ??? 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???

Slide 12

Slide 12 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until 85495282 2017-05-20 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor ??? ???

Slide 13

Slide 13 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 coupons id valid_until 85495282 2017-05-20 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20

Slide 14

Slide 14 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 coupons id valid_until ??? ???

Slide 15

Slide 15 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ??? coupons id valid_until ??? ???

Slide 16

Slide 16 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???

Slide 17

Slide 17 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???

Slide 18

Slide 18 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until ??? ??? Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???

Slide 19

Slide 19 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 coupons id valid_until 43957257 2018-01-01 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor ??? ???

Slide 20

Slide 20 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 sales id customer coupon_id 2958 Condor 85495282 183 Condor 43957257 785257 Condor 23583853 1 L 11 61 61 61 31 33 32 2 L 11 61 61 61 31 36 35 4 L 11 61 61 61 31 39 38 5 L 11 61 61 61 32 33 31 3 6 L 11 61 61 61 32 36 34 7 L 11 61 61 61 32 39 37 8 L 11 61 61 61 33 33 30 9 L 11 61 61 61 33 36 33 10 L 11 61 61 61 33 39 36 11 L 11 61 61 61 34 32 39 12 L 11 61 61 61 34 36 32 13 L 11 61 61 61 34 39 35 14 L 0f 61 61 61 35 32 38 15 L 11 61 61 61 35 36 30 16 L 11 61 61 61 35 39 33 17 L 11 61 61 61 36 32 36 18 L 11 61 61 61 36 35 39 19 L 11 61 61 61 36 39 32 20 L 11 61 61 61 37 32 35 21 L 11 61 61 61 37 35 38 22 L 11 61 61 Join output id customer coupon_id valid_until 2958 Condor 85495282 2017-05-20 183 Condor 43957257 2018-01-01 coupons id valid_until 43957257 2018-01-01

Slide 21

Slide 21 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Nested Loop Join in action -- Column values are “plugged in” here: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)

Slide 22

Slide 22 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Nested Loop Join in action -- Column values are “plugged in” here: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms) 39.47 + (7.91 * 10 rows) = 118.57 (~118.62)

Slide 23

Slide 23 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Low latency OLTP queries Great OLTP performance is possible because nested loop joins fed by index scans (or another nested loop join) can scale wonderfully well. Nothing scales as well as selective index scans. Decent optimizer important too, to get the join ordering right consistently. Sort-merge join and Hash Join are not really “in competition” with nested loop Join — only each other. 23

Slide 24

Slide 24 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 “The cost of this type of query plan [nested loop join heavy plans] grows linearly with the number of records involved, which might very well mean that the cost is effectively independent of the database size…the cost of index look-ups in traditional B-tree indexes grows logarithmically with the database size…” [1] Modern B-Tree techniques, 5.9 Nested Queries and Nested Iteration — Goetz Graefe, Modern B-Tree techniques [1]

Slide 25

Slide 25 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 “…The key to interactive response times, both in online transaction processing (OLTP) and in online analytical processing (OLAP), is to ensure that query results are fetched, not searched and computed…In relational database systems, fetching query results directly means index searches. If a result requires records from multiple indexes, index nested loops join or, more generally, nested iteration are the algorithms of choice.” [1] Modern B-Tree techniques, 5.9 Nested Queries and Nested Iteration — Goetz Graefe, Modern B-Tree techniques [1]

Slide 26

Slide 26 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Presentation Overview 1. Preamble on Nested Loop Join Why it’s very different to hash based or sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. The introduction and popularization of Hash Join Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, memory bandwidth, and the future New era, same old pressures. 5. Duality between Sorting and Hashing revisited Central idea of the presentation, in summary. 26

Slide 27

Slide 27 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Duality — The symmetric relationship between Hash Join and Sort-Merge Join Cube frame with Octahedron dual inside Octahedron frame with Cube dual inside Query Evaluation Techniques for Large Databases — Goetz Graefe (ACM Computing Surveys,Vol. 25, No. 2, June 1993)

Slide 28

Slide 28 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash Joins First, build a hash table of smaller relation. Hash table consists of join attribute as hash key for each row, alongside row itself. Build hash table by scanning smaller table/input. Scan larger table, joining rows from smaller relation by performing lookups in hash table. Only supports equi-joins — joins with simple equality condition in predicate. 28

Slide 29

Slide 29 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash Joins (cont.) Works particularly well when one input is smaller, such as a lookup table. Or, moderately large subset of table (e.g., hash table built by reading from bitmap index scan). High join selectivity works well (e.g., uncorrelated join attributes). Hash table can spill “batches” to disk when insufficient main memory available. 29

Slide 30

Slide 30 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash Join in action -- Hash table built first, by scanning ~100 rows in tenk1 table where “unique1 < 100”: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=230.47..713.98 rows=101 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=229.20..229.20 rows=101 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)

Slide 31

Slide 31 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash Join in action -- Hash table built first, by scanning ~100 rows in tenk1 table where “unique1 < 100”: EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Join (cost=230.47..713.98 rows=101 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=229.20..229.20 rows=101 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms) Naturally, would-be nested loop join plan’s cost exceeds 713.98 (extrapolate from similar “WHERE t1.unique1 < 10” plan): ~39.47 + (7.91 * 100 rows) = 830.47 (approx.)

Slide 32

Slide 32 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Merge Joins Sorted input to Merge Join node on both sides of join. Actual sort node typically appears for at least one side of join. Often both sides. Input might instead come from an index scan. B- Trees “materialize” sort order of columns that are indexed. Sorts can, in general, spill to disk if there is much data to sort. 32

Slide 33

Slide 33 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 conferences conf_id conf_name city_id 23 Systems We Love 4 12 Surge 10 13 PGConf.US 9 24 pgCon 3 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 44 All Your Base 282 15 VLDB2016 18 94 IEEE ICDE 48 25 Velocity 32 59 Percona Live 17 cities city_id city_name 171 Dublin 2 Beijing 1 Johannesburg 10 Baltimore 44 San Mateo 3 Ottawa 9 New York 282 London 4 San Francisco

Slide 34

Slide 34 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 171 Dublin 2 Beijing 1 Johannesburg 10 Baltimore 44 San Mateo 3 Ottawa 9 New York 282 London 4 San Francisco

Slide 35

Slide 35 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 36

Slide 36 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 37

Slide 37 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 38

Slide 38 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 39

Slide 39 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 40

Slide 40 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name 24 pgCon 3 Ottawa conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 41

Slide 41 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name 24 pgCon 3 Ottawa conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 42

Slide 42 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name 24 pgCon 3 Ottawa conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 43

Slide 43 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Join output conf_id conf_name city_id city_name 24 pgCon 3 Ottawa 23 Systems We Love 4 San Francisco conferences conf_id conf_name city_id 24 pgCon 3 23 Systems We Love 4 17 Postgres Vision 4 16 pgConf SV 4 431 ACM SIGMOD 4 2 Strangeloop 7 13 PGConf.US 9 12 Surge 10 59 Percona Live 17 15 VLDB2016 18 25 Velocity 32 94 IEEE ICDE 48 44 All Your Base 282 cities city_id city_name 1 Johannesburg 2 Beijing 3 Ottawa 4 San Francisco 9 New York 10 Baltimore 44 San Mateo 171 Dublin 282 London

Slide 44

Slide 44 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Merge Joins (cont.) More flexible than hash joins in some ways — not limited to equi-joins. Can also be interval-based, for example. Sorts can, in general, spill to disk if there is much data to sort. 44

Slide 45

Slide 45 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Merge Join in action -- N.B.: t2 is from one table, not tench table. -- -- Tuples “sucked up” from each side, in sorted order: EXPLAIN SELECT * FROM tenk1 t1, onek t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ Merge Join (cost=198.11..268.19 rows=10 width=488) Merge Cond: (t1.unique2 = t2.unique2) -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) Filter: (unique1 < 100) -> Sort (cost=197.83..200.33 rows=1000 width=244) Sort Key: t2.unique2 -> Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244) [1] PostgreSQL documentation: EXPLAIN Basics (basic discussion of different join algorithms)

Slide 46

Slide 46 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Presentation Overview 1. Preamble on Nested Loop Join Why it’s very different to hash based or sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. The introduction and popularization of Hash Join Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, memory bandwidth, and the future New era, same old pressures. 5. Duality between Sorting and Hashing revisited Central idea of the presentation, in summary. 46

Slide 47

Slide 47 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 The introduction and popularization of Hash Join: A history lesson Pictured: DEC Alpha AXP 21064 CPU

Slide 48

Slide 48 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Replacement Selection: Merge Scheduling ❖ Knuth dedicated a special pull-out section of his book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives Prehistory: Merge Scheduling, and early importance of Sort-merge join

Slide 49

Slide 49 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 ❖ Knuth dedicated a special pull-out section of his book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives “Readers will benefit most from this material by transplanting themselves temporarily into the mindset of the 1970s. Let us therefore pretend that we still live in that bygone era.” — Donald Knuth, The Art of Computer Programming: Volume 3: Sorting and Searching (2nd Edition,1998)

Slide 50

Slide 50 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 ❖ Knuth dedicated a special pull-out section of his book to various orders in which to merge runs ❖ Each of these is a different strategy with different advantages and disadvantages ❖ Some depend on being able to read tapes backwards or have an operator change tapes ❖ They all assume you have a small fixed number of tape drives • Limited number of tape drives — use of actual magnetic tape decks assumed! • These days, we still almost always do one merge pass. Polyphase techniques of minimal practical value, since in practice we should be able to fit at least one record from each run in memory, for heap that merges (to produce final sorted output). • Every time main memory size doubles, our capacity to have an external sort complete with only one pass quadruples. • Capacity doubled many times since early 1970s! • External sorting often necessary, even still — data sizes have grown, too.

Slide 51

Slide 51 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Trends in hardware make hash join compelling Introduction of hash joins in major systems in late 1980s and early 1990s can, to a large degree, be explained as response to trends in CPU architecture. In particular, the huge increase in main memory sizes. 64-bit architectures become available in high end commodity servers. “Hybrid” hash joins have plausible spill mechanism. It now matters far less that there is no such thing as “polyphase hashing”. Sort-merge join loses a significant advantage. 51

Slide 52

Slide 52 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Additions to speed up hash joins in PostgreSQL 9.6 + 10 Cleverer management of memory, to avoid cache misses during merging — use of “memory chunks”. In Postgres 10, “Robin Hood hashing” open addressing scheme added to replace hash chaining. Parallel hash join proposed for Postgres 10. 52

Slide 53

Slide 53 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Advancements in CPU architecture help sorting, too 64-bit processors become fairly mainstream for server workloads in early 1990s. Memory bandwidth and memory latency bottleneck. CPU caches added to hide latency. 53

Slide 54

Slide 54 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Figure 2. Replacement-selection sort vs. QuickSort °.. .... ~.~_~.~ .~o,.%°~o,. The tournament tree of replacement-selection sort at left has bad cache behavior, unless the entire tournament fits in cache. The diagram at left shows the memory references as a winner is removed, and a new element is added to the tournament. Each traversal of the tree has many cache misses at the leaves of the tree. By contrast, the QuickSort diagrammed on the right fits entirely in the on-board cache, and partially in the on-chip cache. [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)

Slide 55

Slide 55 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Advancements in CPU architecture help sorting, too Using memory capacity as efficiently as possible no longer first concern. Rather, CPU cache efficiency becomes first concern. Forget elegance! Focus on cache misses, and to a lesser extent instruction count and branch prediction. 55 [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)

Slide 56

Slide 56 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Figure 1. How far away is the data? l 109 Tape/Optical ~ ° . m ~ ~ a cn Robot 10 6 Disk O C .E I-- 100 Memory ~, 10 On Board C ..... ~ This Campus Eo 2 On Chip Cache ]his Room O 1 Registers ~ y My Head 2,000 Years i 2 Years .g 1.5hr ~ Z 10 min 1 min A whimsical analogy between computer time and human time, as seen from San Francisco. The scale on the left shows the number of processor cycles needed to get to various levels of the memory hierarchy (measured in 5 ns. processor clock ticks). The scale on the right is a more human scale, showing time based in human units (minutes). Suppose that AlphaSort paid no attention to the cache, and that it randomly [1] AlphaSort: A Cache-Sensitive Parallel External Sort — Nyberg, Barclay, Cvetanovic, Gray, and Lomet (VLDB Journal Volume 4, 1995)

Slide 57

Slide 57 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Additions to speed up sorting in PostgreSQL 9.5, 9.6 “Abbreviated keys” — avoids “pointer chasing”. Use quicksort for external sort operations. Multiple quicksorted runs are merged to produce final output Careful memory management — maximizes locality Avoids CPU cache misses by making memory access to sorted runs sequential. 57

Slide 58

Slide 58 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Additions to speed up sorting in future PostgreSQL version Parallel CREATE INDEX Approach is not based on partitioning, which may be better for parallel query Significant research needed on partitioning for parallel query Discussion on this patch has stalled “Unique joins” 58

Slide 59

Slide 59 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Presentation Overview 1. Preamble on Nested Loop Join Why it’s very different to hash based or sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. The introduction and popularization of Hash Join Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, memory bandwidth, and the future New era, same old pressures. 5. Duality between Sorting and Hashing revisited Central idea of the presentation, in summary. 59

Slide 60

Slide 60 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Parallelism, memory bandwidth, and the future Image: STREAM memory bandwidth benchmarking tool

Slide 61

Slide 61 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Multicore era problems Main memory scaling (scaling of memory size) has benefitted database system performance immensely over the past few decades. More of a transformative technology than any other. Now seems to have stalled, due to problems related to power management [1]. 61 [1] Memory Scaling: A Systems Architecture Perspective

Slide 62

Slide 62 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Multicore era problems (cont.) Core count now growing faster than main memory sizes, and CPUs are faster than ever, but total memory bandwidth a huge bottleneck. “Like driving a Ferrari in New York City”. Available memory bandwidth per core actually seems to be decreasing in absolute terms! [1] 62 [1] Memory Scaling: A Systems Architecture Perspective

Slide 63

Slide 63 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Parallel query & parallel join Decrease in memory bandwidth per core makes parallel join more important than ever. To avoid regressing performance across CPU microarchitecture generations. To make use of all available memory/cache bandwidth. (Not merely to get more “compute bandwidth”, which is usually not pressing concern.) CPU vendors and research community have some ideas, and plenty of incentive, but future is unclear. 63

Slide 64

Slide 64 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash vs. Sort — Merge Join resurgence? Trend towards less memory bandwidth per core favors Sort-merge Join. [1] “Bandwidth-oblivious sort” can gain memory bandwidth at an acceptable cost in compute bandwidth. TLB unfriendly access also a problem for Hash Join. Fewer cacheline accesses inherently necessary (highly tuned implementations of both). 64 [1] Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs

Slide 65

Slide 65 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash vs. Sort — Merge Join resurgence? (cont.) SIMD (instruction-level parallelism) No great way to exploit SIMD for hashing. Bitonic sort can exploit SIMD well [1], though. 256- bit SIMD register CPUs now commercially available. Bitonic sort not readily adaptable to complicated requirements for RDBMSs. Also, careful tuning required. Well written Quicksort really hard to beat [2]. 65 [1] Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited [2] Quicksort Is Optimal — Robert Sedgewick, Jon Bentley

Slide 66

Slide 66 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Presentation Overview 1. Preamble on Nested Loop Join Why it’s very different to hash based or sort based algorithms. 2. Duality of Sort-Merge Join and Hash Join Introduction to both algorithms. 3. The introduction and popularization of Hash Join Why hash joins became popular in late 1980s/early 1990s. 4. Parallelism, memory bandwidth, and the future New era, same old pressures. 5. Duality between Sorting and Hashing revisited Central idea of the presentation, in summary. 66

Slide 67

Slide 67 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Icosahedron frame with Dodecahedron dual inside Dodecahedron frame with Icosahedron dual inside Query Evaluation Techniques for Large Databases — Goetz Graefe (ACM Computing Surveys,Vol. 25, No. 2, June 1993) Duality between Sorting and Hashing revisited

Slide 68

Slide 68 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Duality between Sorting and Hashing revisited: Symmetry Query Evaluation Techniques for Large Databases — Goetz Graefe (ACM Computing Surveys,Vol. 25, No. 2, June 1993) Sort: Physical Division Logical Combination Hash: Logical Division Physical Combination

Slide 69

Slide 69 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Merge Join: Characteristics Locality, locality, locality. External sorts degrade slowly and steadily as less memory is available — cost function is nice and smooth. High cardinality sort keys work well. Works best with low selectivity join qualifications. With final on-the-fly merge step, first output record can be returned before the sort fully finishes (external sorts only). 69

Slide 70

Slide 70 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Hash Join: Characteristics Hash Join works best with low cardinality attribute joined from smaller table. Highly selective join qualifications work best. Doesn’t do well when very memory constrained; cost function looks like a step function when hash table not entirely in-memory. Optimizer will use incorrect selectivity estimates at times. When that happens, hash join may be worse off. 70

Slide 71

Slide 71 text

https://speakerdeck.com/peterg/sort-hash-pgconfus-2017 Thanks! Image: Bitonic Sorting Network (Wikipedia)