Upgrade to Pro — share decks privately, control downloads, hide ads and more …

work mem warriors | PGCon 2020 | Jeff Davis, Melanie Plageman

work mem warriors | PGCon 2020 | Jeff Davis, Melanie Plageman

Recent work on the Postgres executor has made improvements to memory management -- from accounting for the memory used to responding to memory pressure. It is important to bind the memory usage of the database with the appropriate execution mechanisms and to choose those during planning based on cost in order to meet users' expectations and ensure predictable performance.

This talk will cover three such improvements: the addition of memory accounting information to MemoryContexts, the memory-bounding of HashAgg (spilling HashAgg to disk), and the adaptive hashjoin fallback to nested hashloop join.

The talk will also include an interactive session to solicit feedback from users on their expectations and experiences with work_mem and the memory behavior of Postgres.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. work_mem

    View Slide




  2. View Slide






  3. View Slide






  4. View Slide







  5. View Slide

  6. work_mem



    View Slide










  7. View Slide






  8. View Slide






  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide








  13. View Slide

  14. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide

  15. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide









  16. View Slide

  17. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7
    Partitions

    View Slide

  18. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide

  19. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide

  20. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide




  21. View Slide

  22. Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7
    free
    free
    free
    Group 0
    Group 1
    Group 2
    Group 3
    Group 4
    Group 5
    Group 6
    Group 7

    View Slide




  23. View Slide

  24. View Slide

  25. By providing a knob for users to set memory consumption limits and then respecting those limits, resource
    consumption will be stable. Queries that might be faster were memory to be unlimited might be slower, but the memory
    consumption will be predictable. Users can decide to change their workload based on this in order to get the performance they
    are looking for [ ]

    View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. bigger_t smaller_t bigger_t smaller_t
    SELECT * FROM bigger_t JOIN smaller_t USING (i);
    34

    View Slide

  35. bigger_t smaller_t bigger_t smaller_t
    i INT
    bucketno = hash(i) & (*nbuckets - 1)
    hash(i) -> 5
    5 & (4 - 1) -> 1
    35

    View Slide

  36. bigger_t smaller_t bigger_t smaller_t
    i INT
    hash(i) -> 10
    10 & (4 - 1) -> 2
    bucketno = hash(i) & (nbuckets - 1)
    36

    View Slide

  37. bigger_t smaller_t bigger_t smaller_t
    i INT
    hash(i) -> 13
    13 & (4 - 1) -> 1
    37
    bucketno = hash(i) & (nbuckets - 1)

    View Slide

  38. bigger_t smaller_t bigger_t smaller_t
    38

    View Slide

  39. bigger_t smaller_t bigger_t smaller_t
    batchno = (hash(i) >> (nbuckets - 1)) & (*nbatches - 1)
    i INT
    hash(i) -> 2
    (2 >> 3) & 2 -> 0
    2 & 3 -> 2
    39
    hash(i) = 00000010
    1 2
    3
    smaller_t
    batch 0

    View Slide

  40. bigger_t smaller_t bigger_t smaller_t
    batchno = (hash(i) >> (nbuckets - 1)) & (*nbatches - 1)
    i INT
    hash(i) -> 10
    (10 >> 3) & 3 -> 1
    10 & 3 -> 2
    40
    smaller_t
    batch 0
    2
    3
    1

    View Slide

  41. bigger_t smaller_t
    batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1)
    i INT
    41
    hash(i) -> 10
    (10 >> 3) & 3 -> 1
    10 & 3 -> 2
    smaller_t
    batch 1
    bigger_t
    2
    3
    1
    2
    3
    1

    View Slide

  42. bigger_t smaller_t
    batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1)
    i INT
    42
    hash(i) -> 56
    (56 >> 3) & 3 -> 3
    56 & 3 -> 0
    smaller_t
    batch 3

    View Slide

  43. bigger_t smaller_t bigger_t smaller_t
    43

    View Slide

  44. bigger_t smaller_t bigger_t smaller_t
    44

    View Slide

  45. PHJ_BUILD_HASHING_OUTER

    View Slide

  46. bigger_t bigger_t smaller_t
    smaller_t
    batch 1
    smaller_t
    batch 0
    smaller_t
    46

    View Slide

  47. bigger_t smaller_t
    smaller_t
    batch 1
    smaller_t
    smaller_t
    batch 5
    smaller_t
    batch 1
    2
    3
    2
    3
    bigger_t
    batch 1
    47

    View Slide

  48. batch 0
    smaller_t
    batch 1
    1
    48
    batchno = (hash(i) >> (nbuckets - 1)) & (nbatches - 1)

    View Slide

  49. batch 0
    smaller_t
    batch 1
    batch 3
    1
    11 49

    View Slide

  50. batch 0
    smaller_t
    batch 1
    batch 3
    01
    11
    batch 7
    111 50

    View Slide

  51. batch 3
    01
    11 51
    batch 7
    111
    batch 1
    batch 0
    smaller_t

    View Slide

  52. batch 3
    01
    11
    batch 7
    111 52
    batch 1
    batch 0
    smaller_t

    View Slide

  53. batch 3
    11
    batch 7
    111 53
    1011
    batch 11
    01
    batch 1
    batch 0
    smaller_t

    View Slide

  54. bigger_t smaller_t bigger_t smaller_t
    smaller_t
    batch 0
    smaller_t
    batch 1
    PHJ_BUILD_HASHING_INNER
    54

    View Slide

  55. bigger_t bigger_t smaller_t
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 2
    smaller_t
    batch 3
    PHJ_BUILD_HASHING_INNER
    55

    View Slide

  56. bigger_t smaller_t
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 2
    smaller_t
    batch 3
    PHJ_BUILD_HASHING_INNER
    PHJ_BUILD_HASHING_OUTER
    bigger_t
    batch 0
    bigger_t
    batch 1
    bigger_t
    batch 2
    bigger_t
    batch 3 56

    View Slide

  57. bigger_t bigger_t smaller_t
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 2
    smaller_t
    batch 3
    PHJ_BUILD_HASHING_INNER
    57

    View Slide


  58. ● work_mem
    batch 3
    11 batch 7
    111
    1011
    batch 3
    batch 3
    batch 11
    011
    0011
    58

    View Slide



  59. ● BufFile
    batch 3
    11 batch 7
    111
    1011
    batch
    11
    011
    0011
    batch 3
    batch 3
    00011
    10011
    batch
    19
    batch 3
    59

    View Slide





  60. 60

    View Slide





  61. 61

    View Slide




  62. 62

    View Slide


  63. ○ work_mem


    63

    View Slide

  64. batch 3
    11 batch
    7
    111
    1011
    batch
    11
    011
    0011
    batch
    3
    00011
    10011
    batch
    19
    batch
    3
    batch 3
    11 batch
    7
    111
    1011
    batch
    11
    011
    0011
    batch
    3
    batch
    3



    64
    batch
    3

    View Slide

  65. bigger_t
    batch 5
    bigger_t smaller_t bigger_t smaller_t
    stripe 0
    stripe 1
    smaller_t
    2
    3
    2
    3
    bigger_t
    batch 1 smaller_t
    batch 1
    65

    View Slide

  66. bigger_t smaller_t bigger_t smaller_t
    smaller_t
    2
    3
    2
    3
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 0
    stripe 0
    stripe 1
    stripe 0
    66
    bigger_t
    batch 5

    View Slide

  67. bigger_t smaller_t bigger_t smaller_t
    smaller_t
    2
    3
    2
    3
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 1
    stripe 0
    stripe 1
    67

    View Slide

  68. bigger_t smaller_t
    PHJ_BUILD_HASHING_INNER
    bigger_t
    smaller_t
    batch 1
    smaller_t
    batch 0
    smaller_t
    batch 1
    smaller_t
    batch 2
    smaller_t
    batch 3
    smaller_t
    batch 0
    68

    View Slide

  69. bigger_t smaller_t
    smaller_t
    batch 1
    bigger_t
    batch 1
    stripe 0
    stripe 1
    69

    View Slide

  70. bigger_t smaller_t
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 0
    stripe 0
    stripe 1
    stripe 0
    smaller_t
    batch 2
    smaller_t
    batch 3
    bigger_t
    batch 0
    bigger_t
    batch 2
    bigger_t
    batch 3
    smaller_t
    batch 0
    70

    View Slide

  71. bigger_t smaller_t
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 1
    smaller_t
    batch 2
    smaller_t
    batch 3
    bigger_t
    batch 0
    bigger_t
    batch 2
    bigger_t
    batch 3
    smaller_t
    batch 0
    stripe 0
    stripe 1
    71

    View Slide



  72. bigger_t smaller_t
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 0
    72
    stripe 1
    smaller_t
    batch 1
    stripe 0

    View Slide

  73. bigger_t smaller_t
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 0
    outer match statuses
    0001
    bigger_t.batch_1.tuple_1
    matches a tuple in
    smaller_t.batch_1.stripe_0
    bigger_t tuples 2 - 4
    have no matches in
    smaller_t.batch_1.stripe_0 73
    stripe 1
    smaller_t
    batch 1
    stripe 0

    View Slide

  74. bigger_t smaller_t
    bigger_t
    batch 1
    smaller_t
    batch 1
    stripe 1
    outer match statuses
    1001
    bigger_t.batch_1.tuple_4
    matches a tuple in
    smaller_t.batch_1.stripe_1
    74
    stripe 1
    smaller_t
    batch 1
    stripe 0

    View Slide

  75. bigger_t smaller_t
    bigger_t
    batch 1
    outer match statuses
    1001
    75

    View Slide

  76. SharedTupleStore
    MinimalTuple
    TupleIds MinimalTuple
    - SharedBitStore
    EXPLAIN
    76

    View Slide

  77. write_file
    ParallelHashJoinState
    ParallelHashJoinBatch 0
    ParallelHashJoinBatch 1
    Outer SharedTuplestore
    batch 1
    Inner SharedTuplestore
    batch 1
    STS Participant 1
    STS Participant n
    STS Participant 2
    STS Participant 1
    STS Participant n
    STS Participant 2
    P1 Inner STS
    accessor
    P2 Inner STS
    accessor
    write_file
    hashtable
    77

    View Slide

  78. SharedTupleStore
    Inner
    STS
    P2
    Write
    file
    P1
    Write
    file
    STS Chunk
    STS Chunk
    78

    View Slide

  79. SharedTupleStore
    P2
    Write
    file
    P1
    Write
    file
    STS Chunk
    STS Chunk
    Stripe
    1
    Stripe
    1
    Stripe
    2
    Stripe 2
    79
    Inner
    STS

    View Slide

  80. View Slide

  81. View Slide

  82. View Slide






  83. 83
    *https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com
    **https://github.com/melanieplageman/postgres/commits/alternative_AHJ

    View Slide







  84. work_mem**


    84
    *Tomas Vondra’s patch https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh%40development
    ** mentioned in the same thread

    View Slide