Parallelism in PostgreSQL 15

Thomas Munro | Citus Con: An Event for Postgres 2023
fosstodon.org/@tmunro @MengTangmu Parallelism in PostgreSQL 15

$ whoami Thomas Munro • PostgreSQL user/hacker/committer • Work relevant
to this talk: parallel hash joins and various parallel infrastructure • Other interests include: database/kernel interfaces (memory, I/O, threads, IPC), portability, modernisation • Based in New Zealand (highest number of PostgreSQL committers in the world… per capita)

What do I mean by parallelism? For the purposes of
this talk… • Parallelism within a CPU thread (pipelined execution) • Vectorisation (SIMD) • I/O parallelism • Storage/compute separation with fi lter pushdown (Aurora? Neon?) • Distribution across multiple networked computers (FDW, Citus, Greenplum, …) • Multiple CPU cores inside one computer executing one query:  Single image OS, shared memory and storage

Arrival of parallelism in relational databases 7.1 Oracle (Predessor) DB2
4.1 POSTGRES, PostgreSQL 9.6 6.0 Informix Sybase 11.5 SQL Server 7.0 Ingres 2006 QUEL SQL + Parallelism + Parallelism '80s VMS, Sequent '90s (SMP) VMS, UN*X, NT '00s Multicore '10s Multicore ubiquitous '60s, '70s   Mainframes

https://github.com/karlrupp/microprocessor-trend-data (Creative Commons Attribution 4.0 International Public License) “The free
lunch is over*” *Herb Sutter, writing in Dr Dobbs Journal 2005

Not always as easy as it sounds • Some problems
are “embarrassingly” parallelisable • Painting a fence: just add more people • Adding up the sums of a large pile of receipts: subtotals, then total • Some problems are inherently trickier • Finding matching pairs of numbered socks from two large randomised piles: Sort into number order and merge? Partition into buckets by number ranges?

How do we make plans anyway?

🎛 SELECT … Planner Executor

Cost-based optimiser • What are all the ways we could
access the base relations? • Which is the best join order? • Which is the best join strategy for each join? • … and more • Consider many paths, estimate the cost (execution time) for each and make the cheapest into the plan

Parallelism = more paths • Parallel Sequential Scan, Parallel Index
Scan -> scatter “partial” relation to processes on demand • Some executor plan nodes also have “Parallel” variants that communicate between processes • Gather -> collects tuples into one process • Partial/Finalize Aggregate -> perform two-phase aggregation, for example summing subtotals computed by worker processes

Counting in parallel Seq Scan Parallel Seq Scan Partial Aggregate
Partial Aggregate Partial Aggregate Gather Finalize Aggregate Parallel Seq Scan Parallel Seq Scan Scatter Gather cituscon=> explain (costs off) select count(*) from t; QUERY PLAN --------------------- Aggregate -> Seq Scan on t     cituscon=> explain (costs off) select count(*) from t; QUERY PLAN ------------------------------------------ Finalize Aggregate -> Gather Workers Planned: 2 -> Partial Aggregate -> Parallel Seq Scan on t

How do we control parallelism?

parallel_setup_cost=1000 13316 └─ postgres -D /data/clusters/main   25002 ├─ postgres:
parallel worker for PID 13337   25001 ├─ postgres: parallel worker for PID 13337 13441 ├─ postgres: fred salesdb [local] idle 13437 ├─ postgres: fred salesdb [local] idle 13337 ├─ postgres: fred salesdb [local] SELECT 13323 ├─ postgres: logical replication launcher 13321 ├─ postgres: autovacuum launcher 13320 ├─ postgres: walwriter 13319 ├─ postgres: background writer 13318 └─ postgres: checkpointer The cost to start up and shut down shared memory and worker processes. 5ms-50ms depending on OS. (Yeah, ideally we would use threads. One day!)

parallel_tuple_cost=0.1 SELECT * FROM very_large_table WHERE …; We don’t want
to use parallelism if a huge number of tuples will be transferred back into the leader process after a light amount of work in the worker processes.

Rule-based parallel degree • Number of workers to consider is
based on the “driving” table: • ALTER TABLE … SET (parallel_workers = N) • SET min_parallel_table_scan_size = ‘8MB’  8MB → 1 worker  24MB → 2 workers  72MB → 3 workers  x → log(x / min_parallel_table_scan_size) / log(3) + 1 workers • SET min_parallel_index_scan_size = ‘512kB’ • SET max_parallel_workers_per_gather = 2

Nest Loop Join Index Scan Parallel Seq Scan Nest Loop
Join Index Scan Nest Loop Join Index Scan “Parallel- oblivious” Parallel-aware, scans large table, determines parallel degree

Selected plan types

Gather { Scatter { Gather Parallel Seq Scan Nest Loop
Join Index Scan Time Indexes are already ef fi ciently shared between backends

Hash table Hash table Gather { } Duplicated e ff
ort Scatter { Gather Parallel Seq Scan Hash Join Hash Private hash table Seq Scan Build Probe Time Probe Build Build Build Probe Probe

Hash table Hash table Gather { Scatter { Gather Parallel
Seq Scan Hash Join Hash Private hash table Parallel Seq Scan We cannot join arbitrarily chosen blocks from two relations. The results would be nonsense!

Gather { Scatter { Gather }Scatter }Gather Parallel Seq Scan
Parallel Hash Join Parallel Hash Parallel Seq Scan Shared hash table Build Probe Time Probe Build Build Build Probe Probe

What can go wrong?

Hash join memory struggles • work_mem (4MB) × hash_mem_multiplier (2)
× processes • If the data does not fi t, it will fi rst be broken up into separate partitions written out to disk fi rst, to be processed one at a time • Problem: when work_mem is small, the set of partitions grows very large, and the number of fi les and per-partition bu ff ers grows very large; perhaps by more than can be saved! This is an open problem.

Some things that prevent   parallelism • FULL OUTER JOINs
(expected in PostgreSQL 16) • Many FDWs don’t support parallelism • Cursors • max_rows (set by GUIs like DbVisualizer) • Queries that write or lock rows • Functions not marked PARALLEL SAFE

Major future projects • Write queries (INSERT, UPDATE, DELETE) •
Management of total resource usage across whole system (memory, CPU, I/O)? • Alternative strategies for hash join memory management • Threads rather than processes • Replacing the recursive volcano executor design • More general partition-based parallelism

Thomas Munro Citus Con: An Event for Postgres 2023 fosstodon.org/@tmunro
@MengTangmu fi n

Parallelism in PostgreSQL 15

Parallelism in PostgreSQL 15

Thomas Munro

More Decks by Thomas Munro

Featured

Transcript

Thomas Munro | Citus Con: An Event for Postgres 2023

$ whoami Thomas Munro • PostgreSQL user/hacker/committer • Work relevant

What do I mean by parallelism? For the purposes of

Arrival of parallelism in relational databases 7.1 Oracle (Predessor) DB2

https://github.com/karlrupp/microprocessor-trend-data (Creative Commons Attribution 4.0 International Public License) “The free

Not always as easy as it sounds • Some problems

How do we make plans anyway?

🎛 SELECT … Planner Executor

Cost-based optimiser • What are all the ways we could

Parallelism = more paths • Parallel Sequential Scan, Parallel Index

Counting in parallel Seq Scan Parallel Seq Scan Partial Aggregate

How do we control parallelism?

parallel_setup_cost=1000 13316 └─ postgres -D /data/clusters/main   25002 ├─ postgres:

parallel_tuple_cost=0.1 SELECT * FROM very_large_table WHERE …; We don’t want

Rule-based parallel degree • Number of workers to consider is

Nest Loop Join Index Scan Parallel Seq Scan Nest Loop

Selected plan types

Gather { Scatter { Gather Parallel Seq Scan Nest Loop

Hash table Hash table Gather { } Duplicated e ff

Hash table Hash table Gather { Scatter { Gather Parallel

Gather { Scatter { Gather }Scatter }Gather Parallel Seq Scan

What can go wrong?

Hash join memory struggles • work_mem (4MB) × hash_mem_multiplier (2)

Some things that prevent   parallelism • FULL OUTER JOINs

Major future projects • Write queries (INSERT, UPDATE, DELETE) •

Thomas Munro Citus Con: An Event for Postgres 2023 fosstodon.org/@tmunro