Slide 1

Slide 1 text

work_mem

Slide 2

Slide 2 text

● ● ●

Slide 3

Slide 3 text

● ● ○ ● ●

Slide 4

Slide 4 text

● ○ ○ ○ ●

Slide 5

Slide 5 text

● ○ ○ ● ○ ●

Slide 6

Slide 6 text

work_mem ● ● ●

Slide 7

Slide 7 text

● ● ○ ● ○ ● ○ ○ ○

Slide 8

Slide 8 text

● ● ● ● ●

Slide 9

Slide 9 text

● ● ● ● ●

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

● ● ● ○ ● ● ●

Slide 14

Slide 14 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 15

Slide 15 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 16

Slide 16 text

● ● ● ● ● ● ● ●

Slide 17

Slide 17 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Partitions

Slide 18

Slide 18 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 19

Slide 19 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 20

Slide 20 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 21

Slide 21 text

● ○ ●

Slide 22

Slide 22 text

Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 free free free Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Slide 23

Slide 23 text

● ○ ●

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

By providing a knob for users to set memory consumption limits and then respecting those limits, resource consumption will be stable. Queries that might be faster were memory to be unlimited might be slower, but the memory consumption will be predictable. Users can decide to change their workload based on this in order to get the performance they are looking for [ ]

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

bigger_t smaller_t bigger_t smaller_t SELECT * FROM bigger_t JOIN smaller_t USING (i); 34

Slide 35

Slide 35 text

bigger_t smaller_t bigger_t smaller_t i INT bucketno = hash(i) & (*nbuckets - 1) hash(i) -> 5 5 & (4 - 1) -> 1 35

Slide 36

Slide 36 text

bigger_t smaller_t bigger_t smaller_t i INT hash(i) -> 10 10 & (4 - 1) -> 2 bucketno = hash(i) & (nbuckets - 1) 36

Slide 37

Slide 37 text

bigger_t smaller_t bigger_t smaller_t i INT hash(i) -> 13 13 & (4 - 1) -> 1 37 bucketno = hash(i) & (nbuckets - 1)

Slide 38

Slide 38 text

bigger_t smaller_t bigger_t smaller_t 38

Slide 39

Slide 39 text

bigger_t smaller_t bigger_t smaller_t batchno = (hash(i) >> (nbuckets - 1)) & (*nbatches - 1) i INT hash(i) -> 2 (2 >> 3) & 2 -> 0 2 & 3 -> 2 39 hash(i) = 00000010 1 2 3 smaller_t batch 0

Slide 40

Slide 40 text

bigger_t smaller_t bigger_t smaller_t batchno = (hash(i) >> (nbuckets - 1)) & (*nbatches - 1) i INT hash(i) -> 10 (10 >> 3) & 3 -> 1 10 & 3 -> 2 40 smaller_t batch 0 2 3 1

Slide 41

Slide 41 text

bigger_t smaller_t batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1) i INT 41 hash(i) -> 10 (10 >> 3) & 3 -> 1 10 & 3 -> 2 smaller_t batch 1 bigger_t 2 3 1 2 3 1

Slide 42

Slide 42 text

bigger_t smaller_t batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1) i INT 42 hash(i) -> 56 (56 >> 3) & 3 -> 3 56 & 3 -> 0 smaller_t batch 3

Slide 43

Slide 43 text

bigger_t smaller_t bigger_t smaller_t 43

Slide 44

Slide 44 text

bigger_t smaller_t bigger_t smaller_t 44

Slide 45

Slide 45 text

PHJ_BUILD_HASHING_OUTER

Slide 46

Slide 46 text

bigger_t bigger_t smaller_t smaller_t batch 1 smaller_t batch 0 smaller_t 46

Slide 47

Slide 47 text

bigger_t smaller_t smaller_t batch 1 smaller_t smaller_t batch 5 smaller_t batch 1 2 3 2 3 bigger_t batch 1 47

Slide 48

Slide 48 text

batch 0 smaller_t batch 1 1 48 batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1)

Slide 49

Slide 49 text

batch 0 smaller_t batch 1 batch 3 1 11 49

Slide 50

Slide 50 text

batch 0 smaller_t batch 1 batch 3 01 11 batch 7 111 50

Slide 51

Slide 51 text

batch 3 01 11 51 batch 7 111 batch 1 batch 0 smaller_t

Slide 52

Slide 52 text

batch 3 01 11 batch 7 111 52 batch 1 batch 0 smaller_t

Slide 53

Slide 53 text

batch 3 11 batch 7 111 53 1011 batch 11 01 batch 1 batch 0 smaller_t

Slide 54

Slide 54 text

bigger_t smaller_t bigger_t smaller_t smaller_t batch 0 smaller_t batch 1 PHJ_BUILD_HASHING_INNER 54

Slide 55

Slide 55 text

bigger_t bigger_t smaller_t smaller_t batch 0 smaller_t batch 1 smaller_t batch 0 smaller_t batch 1 smaller_t batch 2 smaller_t batch 3 PHJ_BUILD_HASHING_INNER 55

Slide 56

Slide 56 text

bigger_t smaller_t smaller_t batch 0 smaller_t batch 1 smaller_t batch 2 smaller_t batch 3 PHJ_BUILD_HASHING_INNER PHJ_BUILD_HASHING_OUTER bigger_t batch 0 bigger_t batch 1 bigger_t batch 2 bigger_t batch 3 56

Slide 57

Slide 57 text

bigger_t bigger_t smaller_t smaller_t batch 0 smaller_t batch 1 smaller_t batch 0 smaller_t batch 1 smaller_t batch 2 smaller_t batch 3 PHJ_BUILD_HASHING_INNER 57

Slide 58

Slide 58 text

● ● work_mem batch 3 11 batch 7 111 1011 batch 3 batch 3 batch 11 011 0011 58

Slide 59

Slide 59 text

● ● ● BufFile batch 3 11 batch 7 111 1011 batch 11 011 0011 batch 3 batch 3 00011 10011 batch 19 batch 3 59

Slide 60

Slide 60 text

● ○ ○ ○ 60

Slide 61

Slide 61 text

● ○ ○ ○ 61

Slide 62

Slide 62 text

● ○ ○ 62

Slide 63

Slide 63 text

● ○ work_mem ○ ○ 63

Slide 64

Slide 64 text

batch 3 11 batch 7 111 1011 batch 11 011 0011 batch 3 00011 10011 batch 19 batch 3 batch 3 11 batch 7 111 1011 batch 11 011 0011 batch 3 batch 3 ● ● ● 64 batch 3

Slide 65

Slide 65 text

bigger_t batch 5 bigger_t smaller_t bigger_t smaller_t stripe 0 stripe 1 smaller_t 2 3 2 3 bigger_t batch 1 smaller_t batch 1 65

Slide 66

Slide 66 text

bigger_t smaller_t bigger_t smaller_t smaller_t 2 3 2 3 bigger_t batch 1 smaller_t batch 1 stripe 0 stripe 0 stripe 1 stripe 0 66 bigger_t batch 5

Slide 67

Slide 67 text

bigger_t smaller_t bigger_t smaller_t smaller_t 2 3 2 3 bigger_t batch 1 smaller_t batch 1 stripe 1 stripe 0 stripe 1 67

Slide 68

Slide 68 text

bigger_t smaller_t PHJ_BUILD_HASHING_INNER bigger_t smaller_t batch 1 smaller_t batch 0 smaller_t batch 1 smaller_t batch 2 smaller_t batch 3 smaller_t batch 0 68

Slide 69

Slide 69 text

bigger_t smaller_t smaller_t batch 1 bigger_t batch 1 stripe 0 stripe 1 69

Slide 70

Slide 70 text

bigger_t smaller_t bigger_t batch 1 smaller_t batch 1 stripe 0 stripe 0 stripe 1 stripe 0 smaller_t batch 2 smaller_t batch 3 bigger_t batch 0 bigger_t batch 2 bigger_t batch 3 smaller_t batch 0 70

Slide 71

Slide 71 text

bigger_t smaller_t bigger_t batch 1 smaller_t batch 1 stripe 1 smaller_t batch 2 smaller_t batch 3 bigger_t batch 0 bigger_t batch 2 bigger_t batch 3 smaller_t batch 0 stripe 0 stripe 1 71

Slide 72

Slide 72 text

● ● bigger_t smaller_t bigger_t batch 1 smaller_t batch 1 stripe 0 72 stripe 1 smaller_t batch 1 stripe 0

Slide 73

Slide 73 text

bigger_t smaller_t bigger_t batch 1 smaller_t batch 1 stripe 0 outer match statuses 0001 bigger_t.batch_1.tuple_1 matches a tuple in smaller_t.batch_1.stripe_0 bigger_t tuples 2 - 4 have no matches in smaller_t.batch_1.stripe_0 73 stripe 1 smaller_t batch 1 stripe 0

Slide 74

Slide 74 text

bigger_t smaller_t bigger_t batch 1 smaller_t batch 1 stripe 1 outer match statuses 1001 bigger_t.batch_1.tuple_4 matches a tuple in smaller_t.batch_1.stripe_1 74 stripe 1 smaller_t batch 1 stripe 0

Slide 75

Slide 75 text

bigger_t smaller_t bigger_t batch 1 outer match statuses 1001 75

Slide 76

Slide 76 text

SharedTupleStore MinimalTuple TupleIds MinimalTuple - SharedBitStore EXPLAIN 76

Slide 77

Slide 77 text

write_file ParallelHashJoinState ParallelHashJoinBatch 0 ParallelHashJoinBatch 1 Outer SharedTuplestore batch 1 Inner SharedTuplestore batch 1 STS Participant 1 STS Participant n STS Participant 2 STS Participant 1 STS Participant n STS Participant 2 P1 Inner STS accessor P2 Inner STS accessor write_file hashtable 77

Slide 78

Slide 78 text

SharedTupleStore Inner STS P2 Write file P1 Write file STS Chunk STS Chunk 78

Slide 79

Slide 79 text

SharedTupleStore P2 Write file P1 Write file STS Chunk STS Chunk Stripe 1 Stripe 1 Stripe 2 Stripe 2 79 Inner STS

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

● ○ ○ ● ● 83 *https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com **https://github.com/melanieplageman/postgres/commits/alternative_AHJ

Slide 84

Slide 84 text

● ○ ■ ■ ● ○ work_mem** ■ ■ 84 *Tomas Vondra’s patch https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh%40development ** mentioned in the same thread