Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parallelism in PostgreSQL 15

Thomas Munro
April 20, 2023
46

Parallelism in PostgreSQL 15

A short talk given at Citus Con 2023 about some concepts and problems in parallel query execution.

Video: https://youtu.be/3v-cthowG10

Thomas Munro

April 20, 2023
Tweet

Transcript

  1. Thomas Munro | Citus Con: An Event for Postgres 2023

    fosstodon.org/@tmunro

    @MengTangmu

    Parallelism in PostgreSQL 15

    View Slide

  2. $ whoami
    Thomas Munro
    • PostgreSQL user/hacker/committer

    • Work relevant to this talk: parallel hash joins and various parallel
    infrastructure

    • Other interests include: database/kernel interfaces (memory, I/O, threads,
    IPC), portability, modernisation

    • Based in New Zealand (highest number of PostgreSQL committers in the
    world… per capita)

    View Slide

  3. What do I mean by parallelism?
    For the purposes of this talk…
    • Parallelism within a CPU thread (pipelined execution)

    • Vectorisation (SIMD)

    • I/O parallelism

    • Storage/compute separation with
    fi
    lter pushdown (Aurora? Neon?)

    • Distribution across multiple networked computers (FDW, Citus, Greenplum, …)

    • Multiple CPU cores inside one computer executing one query:

    Single image OS, shared memory and storage

    View Slide

  4. Arrival of parallelism in relational databases
    7.1
    Oracle
    (Predessor)
    DB2 4.1
    POSTGRES, PostgreSQL 9.6
    6.0
    Informix
    Sybase 11.5
    SQL Server 7.0
    Ingres 2006
    QUEL
    SQL
    + Parallelism
    + Parallelism
    '80s


    VMS, Sequent
    '90s


    (SMP) VMS, UN*X, NT
    '00s


    Multicore
    '10s


    Multicore ubiquitous
    '60s, '70s

    Mainframes

    View Slide

  5. https://github.com/karlrupp/microprocessor-trend-data

    (Creative Commons Attribution 4.0 International Public License)
    “The free lunch is over*”
    *Herb Sutter, writing in Dr Dobbs Journal 2005

    View Slide

  6. Not always as easy as it sounds
    • Some problems are “embarrassingly” parallelisable

    • Painting a fence: just add more people

    • Adding up the sums of a large pile of receipts: subtotals, then total

    • Some problems are inherently trickier

    • Finding matching pairs of numbered socks from two large
    randomised piles: Sort into number order and merge? Partition
    into buckets by number ranges?

    View Slide

  7. How do we make plans anyway?

    View Slide

  8. 🎛
    SELECT … Planner Executor

    View Slide

  9. Cost-based optimiser
    • What are all the ways we could access the base relations?

    • Which is the best join order?

    • Which is the best join strategy for each join?

    • … and more

    • Consider many paths, estimate the cost (execution time)
    for each and make the cheapest into the plan

    View Slide

  10. Parallelism = more paths
    • Parallel Sequential Scan, Parallel Index Scan -> scatter “partial” relation to
    processes on demand

    • Some executor plan nodes also have “Parallel” variants that communicate
    between processes

    • Gather -> collects tuples into one process

    • Partial/Finalize Aggregate -> perform two-phase aggregation, for example
    summing subtotals computed by worker processes

    View Slide

  11. Counting in parallel
    Seq Scan
    Parallel
    Seq Scan
    Partial
    Aggregate
    Partial
    Aggregate
    Partial
    Aggregate
    Gather
    Finalize


    Aggregate
    Parallel
    Seq Scan
    Parallel
    Seq Scan Scatter
    Gather
    cituscon=> explain (costs off) select count(*) from t;


    QUERY PLAN


    ---------------------


    Aggregate


    -> Seq Scan on t


    cituscon=> explain (costs off) select count(*) from t;


    QUERY PLAN


    ------------------------------------------


    Finalize Aggregate


    -> Gather


    Workers Planned: 2


    -> Partial Aggregate


    -> Parallel Seq Scan on t

    View Slide

  12. How do we control parallelism?

    View Slide

  13. parallel_setup_cost=1000
    13316 └─ postgres -D /data/clusters/main

    25002 ├─ postgres: parallel worker for PID 13337

    25001 ├─ postgres: parallel worker for PID 13337


    13441 ├─ postgres: fred salesdb [local] idle


    13437 ├─ postgres: fred salesdb [local] idle


    13337 ├─ postgres: fred salesdb [local] SELECT


    13323 ├─ postgres: logical replication launcher


    13321 ├─ postgres: autovacuum launcher


    13320 ├─ postgres: walwriter


    13319 ├─ postgres: background writer


    13318 └─ postgres: checkpointer


    The cost to start up and shut down shared memory and
    worker processes. 5ms-50ms depending on OS. (Yeah,
    ideally we would use threads. One day!)

    View Slide

  14. parallel_tuple_cost=0.1
    SELECT * FROM very_large_table WHERE …;
    We don’t want to use parallelism if a huge number of tuples
    will be transferred back into the leader process after a light
    amount of work in the worker processes.

    View Slide

  15. Rule-based parallel degree
    • Number of workers to consider is based on the “driving” table:

    • ALTER TABLE … SET (parallel_workers = N)

    • SET min_parallel_table_scan_size = ‘8MB’

    8MB → 1 worker

    24MB → 2 workers

    72MB → 3 workers

    x → log(x / min_parallel_table_scan_size) / log(3) + 1 workers

    • SET min_parallel_index_scan_size = ‘512kB’

    • SET max_parallel_workers_per_gather = 2

    View Slide

  16. Nest Loop
    Join
    Index Scan
    Parallel Seq
    Scan
    Nest Loop
    Join
    Index Scan
    Nest Loop
    Join
    Index Scan
    “Parallel-
    oblivious”
    Parallel-aware,
    scans large table,
    determines
    parallel degree

    View Slide

  17. Selected plan types

    View Slide

  18. Gather
    {
    Scatter
    {
    Gather
    Parallel Seq
    Scan
    Nest Loop
    Join
    Index Scan
    Time
    Indexes are
    already ef
    fi
    ciently
    shared between
    backends

    View Slide

  19. Hash
    table
    Hash
    table
    Gather
    { }
    Duplicated e
    ff
    ort
    Scatter
    {
    Gather
    Parallel Seq
    Scan
    Hash Join
    Hash
    Private
    hash
    table
    Seq Scan
    Build
    Probe
    Time
    Probe
    Build
    Build
    Build
    Probe
    Probe

    View Slide

  20. Hash
    table
    Hash
    table
    Gather
    {
    Scatter
    {
    Gather
    Parallel Seq
    Scan
    Hash Join
    Hash
    Private
    hash
    table
    Parallel Seq
    Scan
    We cannot join
    arbitrarily chosen
    blocks from two
    relations. The
    results would be
    nonsense!

    View Slide

  21. Gather
    {
    Scatter
    {
    Gather
    }Scatter
    }Gather
    Parallel Seq
    Scan
    Parallel
    Hash Join
    Parallel
    Hash
    Parallel Seq
    Scan
    Shared
    hash
    table
    Build
    Probe
    Time
    Probe
    Build
    Build
    Build
    Probe
    Probe

    View Slide

  22. What can go wrong?

    View Slide

  23. Hash join memory struggles
    • work_mem (4MB) × hash_mem_multiplier (2) × processes

    • If the data does not
    fi
    t, it will
    fi
    rst be broken up into
    separate partitions written out to disk
    fi
    rst, to be processed
    one at a time

    • Problem: when work_mem is small, the set of partitions
    grows very large, and the number of
    fi
    les and per-partition
    bu
    ff
    ers grows very large; perhaps by more than can be
    saved! This is an open problem.

    View Slide

  24. Some things that prevent

    parallelism
    • FULL OUTER JOINs (expected in PostgreSQL 16)

    • Many FDWs don’t support parallelism

    • Cursors

    • max_rows (set by GUIs like DbVisualizer)

    • Queries that write or lock rows

    • Functions not marked PARALLEL SAFE

    View Slide

  25. Major future projects
    • Write queries (INSERT, UPDATE, DELETE)

    • Management of total resource usage across whole system (memory, CPU,
    I/O)?

    • Alternative strategies for hash join memory management

    • Threads rather than processes

    • Replacing the recursive volcano executor design

    • More general partition-based parallelism

    View Slide

  26. Thomas Munro

    Citus Con: An Event for Postgres 2023

    fosstodon.org/@tmunro

    @MengTangmu

    fi
    n

    View Slide