Slide 1

Slide 1 text

1 CLAIRE GIORDANO @clairegiordano • @PosetteConf • @AzureDBPostgres • @citusdata Beginner's Guide to Partitioning vs. Sharding in Postgres

Slide 2

Slide 2 text

Inspiration for this talk came from a #PGSQLPhriday community blog-fest back in August 2023 Hosted by Tomasz Gintowt

Slide 3

Slide 3 text

So I wrote a thing for PGSQL Phriday “Understanding Partitioning and Sharding in Postgres and Citus” aka.ms/blog-partitioning-sharding-claire

Slide 4

Slide 4 text

4 So happy to share this PGConfEU talk here in NYC

Slide 5

Slide 5 text

@clairegiordano @[email protected] Engineer Dev Manager PM Marketer Writer Open-source champion Community Lead Podcaster

Slide 6

Slide 6 text

6 Quite a lot of Postgres work @ Microsoft PostgreSQL core Citus Open Source PG Ecosystem Azure Database for PostgreSQL Azure CosmosDB for PostgreSQL MANAGED DATABASE SERVICES OPEN SOURCE WORK fully-managed database service for Postgres service powered by Citus extension to Postgres NEW CAPABILITIES in FLEXIBLE SERVER NEW CAPABILITIES Contribute to PG open source (and review patches on many other people's work!) Citus open-source extension to Postgres gives you Postgres at any scale. (Think: distributed Postgres.) Postgres extension & tooling our PG team at Microsoft maintains or contributed to in last ~8 months aka.ms/blog-pg-at-microsoft “Flexible Server” PG Community Contribute to growth & knowledge of the PostgreSQL open-source developer & user communities.

Slide 7

Slide 7 text

7 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance This Beginner Guide has _8_ chapters What is Postgres partitioning?

Slide 8

Slide 8 text

8 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 1 What is Postgres partitioning?

Slide 9

Slide 9 text

What is Postgres table partitioning Let’s say this huge Redwood Tree is like a big Postgres table...

Slide 10

Slide 10 text

What is Postgres table partitioning What if we could split this tree trunk into many smaller logs, & only work with the ones we need?

Slide 11

Slide 11 text

• Splits large tables into many smaller tables (partitions) • Built-in, native feature in Postgres • Needs to be declared up-front when you first CREATE TABLE

Slide 12

Slide 12 text

So many improvements to Postgres partitioning in last 5-6 years Shout-out to @Brandur for this blog post... aka.ms/blog-partitioning-2022-brandur

Slide 13

Slide 13 text

13 What are the benefits of Postgres Partitioning? Can improve query performance Improves performance of bulk deletes /if drop partitions Improves autovacuum performance /in so many cases!! Enables you to optimize storage costs (cheap vs. expensive)

Slide 14

Slide 14 text

In Postgres, trying to remove old rows from a large, hot table is flitting with disaster. A long running query must iterate through and mark each one as dead, and even then nothing is reclaimed until an equally expensive vacuum runs through and frees space, and only when it's allowed to after rows are no longer visible to any other query in the system, “ — Brandur whether they're making use of the large table or not. Each row removal land in the WAL, resulting in significant amplification. But with partitions, deletion becomes a simple DROP TABLE. It executes instantly, and with negligible costs (partitioning has other benefits too). The trade-off is maintenance." From Brandur.org Title of post: “Partitioning in Postgres, 2022 Edition”

Slide 15

Slide 15 text

15 Declarative Postgres partitioning has 3 partitioning methods PARTITION BY RANGE PARTITION BY LIST PARTITION BY HASH Time-series & IOT Countries, company divisions,… When partitioning by range & by list do not work... when NOT an obvious partitioning key

Slide 16

Slide 16 text

16 Tracking monthly concert revenue for Taylor Swift Step 1: Decide on partition method Step 2: Decide on partition key Step 3: Create a partitioned table Step 4: Create partitions

Slide 17

Slide 17 text

17 Let’s FIRST create a partitioned table CREATE TABLE concert_revenue ( city_id int not null, sale_date date not null, merch_sales int, ticket_sales int ) PARTITION BY RANGE (sale_date);

Slide 18

Slide 18 text

18 Let’s create a partition for June 2024 CREATE TABLE concert_revenue_cy2024m06 PARTITION OF concert_revenue FOR VALUES FROM ('2024-06-01') TO ('2024-07-01'); TIP: lower bound is INCLUSIVE TIP: upper bound (2024-07-01) is NOT inclusive

Slide 19

Slide 19 text

19 Let’s create 3 partitions for Jun / Jul / Aug 2024 CREATE TABLE concert_revenue_cy2024m06 PARTITION OF concert_revenue FOR VALUES FROM ('2024-06-01') TO ('2024-07-01'); CREATE TABLE concert_revenue_cy2024m07 PARTITION OF concert_revenue FOR VALUES FROM ('2024-07-01') TO ('2024-08-01'); CREATE TABLE concert_revenue_cy2024m08 PARTITION OF concert_revenue FOR VALUES FROM ('2024-08-01') TO ('2024-09-01');

Slide 20

Slide 20 text

20 Later, you can delete older data—e.g. June 2024 DROP TABLE concert_revenue_cy2024m06;

Slide 21

Slide 21 text

Partitioning is not magic! ...You need to maintain these partitions pg_partman with pg_cron

Slide 22

Slide 22 text

22 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 2 What is Postgres partitioning?

Slide 23

Slide 23 text

What is sharding? 23 Splitting Postgres tables into smaller tables (“shards”)

Slide 24

Slide 24 text

What is sharding? 24 Splitting Postgres tables into smaller tables (“shards”) Distributing shards horizontally across multiple nodes AND

Slide 25

Slide 25 text

Alternative / Equivalent terms for “Sharding” in Postgres 25 Horizontal Sharding Scaling out Horizontally Distributing Postgres

Slide 26

Slide 26 text

Back to previous metaphor If you have one big ginormous tree trunk & you want to split it into smaller pieces & distribute it across multiple parts of forest

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

28 3 common ways to shard Postgres 2 1 Partitions + postgres_fdw Manual sharding sometimes called “sharding at application layer” Create hash-partitioned Postgres table in which every partition is a foreign table using postgres_fdw 3 Citus Extension to Postgres Open source, & also a managed service

Slide 29

Slide 29 text

Bulk of sharding work done by... 2 1 3 DBA Citus Dev App Dev Partitions + postgres_fdw Manual sharding Citus

Slide 30

Slide 30 text

In today’s talk, we’ll focus on sharding with Citus extension to Postgres

Slide 31

Slide 31 text

31 Diagram of sharding Postgres with Citus Postgres Single Node Distributed Citus cluster with shards (smaller tables) on each node in cluster

Slide 32

Slide 32 text

32 Citus extension = LOT more than sharding aka.ms/blog-citus-technical-readme

Slide 33

Slide 33 text

33 Benefit of sharding Postgres with Citus: >>>> Get more Performance & Scale than you can eek out on a single node Can improve query performance @ scale //due to all the memory, CPU, & disk from multi-node cluster Enables your application to grow & scale // not just now, but in future Improves performance of autovacuum //since it runs in parallel across all the nodes in the cluster

Slide 34

Slide 34 text

34 How to distribute tables with Citus for row-based sharding SELECT create_distributed_table( 'table_name', 'sharding_key'); N.B. “sharding key” is often called a “distribution column”

Slide 35

Slide 35 text

35 New to Citus 12.0, a 2nd way to shard: schema-based sharding SELECT citus_schema_distribute( 'name');

Slide 36

Slide 36 text

36 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 3 What is Postgres partitioning?

Slide 37

Slide 37 text

37 Can you Partition & Shard together?

Slide 38

Slide 38 text

38 Can you Partition & Shard together? Single PG node Partitions on a single Postgres node Sharded Postgres partitions on a distributed 4-node Citus cluster Node 1 Node 2 Node 3 Node 4

Slide 39

Slide 39 text

39 These two Citus UDFs simplify partition management create_time_partitions(table_name regclass, partition_interval interval, end_at timestamp with time zone, start_from timestamp with time zone DEFAULT now()) #1—Create as many partitions as necessary for given time range

Slide 40

Slide 40 text

40 These 2 Citus UDFs simplify partition management drop_old_time_partitions(table_name regclass, older_than timestamp with time zone) #2—Drop all partitions older than given timestamp

Slide 41

Slide 41 text

41 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 4 What is Postgres partitioning?

Slide 42

Slide 42 text

Let’s look at a comparison table! How partitioning & sharding are different

Slide 43

Slide 43 text

Postgres native Partitioning Sharding with Citus extension Partitioning + Sharding Combination 2 1 3 0 Aspects or attributes of these technologies

Slide 44

Slide 44 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 45

Slide 45 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 46

Slide 46 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 47

Slide 47 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 48

Slide 48 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 49

Slide 49 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 50

Slide 50 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 51

Slide 51 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 52

Slide 52 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 53

Slide 53 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 54

Slide 54 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 55

Slide 55 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 56

Slide 56 text

Attribute Partitioning Citus Sharding Partition + Shard Combo Built into Postgres Extension to Postgres Single node Multi-node Drop old data quickly Parallel, distributed SQL/DDL/DML Partition/Shard Pruning Parallel autovacuum Better index cache hit ratios Automatic maintenance Time series apps (e.g. IOT) Multi-tenant SaaS apps

Slide 57

Slide 57 text

57 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 5 What is Postgres partitioning?

Slide 58

Slide 58 text

Why partitioning can help improve query performance Partition Pruning

Slide 59

Slide 59 text

Why partitioning can help improve query performance Data Locality

Slide 60

Slide 60 text

Why partitioning can help improve query performance Partition Pruning Data Locality &

Slide 61

Slide 61 text

61 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 6 What is Postgres partitioning?

Slide 62

Slide 62 text

When partitioning helps improve query performance Just like when you bake a cake, need essential ingredients to get the right result

Slide 63

Slide 63 text

When partitioning helps improve query performance Only when you use a WHERE clause that prunes (excludes) partitions from the query

Slide 64

Slide 64 text

64 Remember that Taylor Swift concert example... Tracking monthly concert revenues... What if we wanted to know what the August revenue was from ticket sales + merch sales?

Slide 65

Slide 65 text

65 This EXPLAIN query is missing WHERE clause EXPLAIN (costs off) SELECT count(*) FROM concert_revenue;

Slide 66

Slide 66 text

66 So it has to scan all 3 of the partitions  EXPLAIN (costs off) SELECT count(*) FROM concert_revenue; QUERY PLAN --------------------------------------------------------------------- Aggregate -> Append -> Seq Scan on concert_revenue_cy2024m06 concert_revenue_1 -> Seq Scan on concert_revenue_cy2024m07 concert_revenue_2 -> Seq Scan on concert_revenue_cy2024m08 concert_revenue_3 (5 rows)

Slide 67

Slide 67 text

67 And doesn’t benefit from partition pruning  EXPLAIN (costs off) SELECT count(*) FROM concert_revenue; QUERY PLAN --------------------------------------------------------------------- Aggregate -> Append -> Seq Scan on concert_revenue_cy2024m06 concert_revenue_1 -> Seq Scan on concert_revenue_cy2024m07 concert_revenue_2 -> Seq Scan on concert_revenue_cy2024m08 concert_revenue_3 (5 rows)

Slide 68

Slide 68 text

68 This EXPLAIN query includes a WHERE clause EXPLAIN (costs off) SELECT count(*) FROM concert_revenue WHERE sale_date BETWEEN '2024-08-01' AND '2024-08-31';

Slide 69

Slide 69 text

69 And so it only needs to scan the Aug partition EXPLAIN (costs off) SELECT count(*) FROM concert_revenue WHERE sale_date BETWEEN '2024-08-01' AND '2024-08-31'; QUERY PLAN --------------------------------------------------------------------- ---------------------- Aggregate -> Seq Scan on concert_revenue_cy2024m08 concert_revenue Filter: ((sale_date >= '2024-08-01'::date) AND (sale_date <= '2024-08-31'::date)) (3 rows)

Slide 70

Slide 70 text

70 And so it only needs to scan the Aug partition EXPLAIN (costs off) SELECT count(*) FROM concert_revenue WHERE sale_date BETWEEN '2024-08-01' AND '2024-08-31'; QUERY PLAN --------------------------------------------------------------------- ---------------------- Aggregate -> Seq Scan on concert_revenue_cy2024m08 concert_revenue Filter: ((sale_date >= '2024-08-01'::date) AND (sale_date <= '2024-08-31'::date)) (3 rows)

Slide 71

Slide 71 text

71 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 7 What is Postgres partitioning?

Slide 72

Slide 72 text

Why sharding can help query performance Let’s say this huge Redwood Tree is like a big Postgres table... More CPU, Memory, Disk

Slide 73

Slide 73 text

Why sharding can help query performance Let’s say this huge Redwood Tree is like a big Postgres table... More CPU, Memory, Disk

Slide 74

Slide 74 text

Why sharding can help query performance Let’s say this huge Redwood Tree is like a big Postgres table... Shard Pruning Logic

Slide 75

Slide 75 text

Why sharding can help query performance Co-Located Tables

Slide 76

Slide 76 text

Why sharding can help query performance Let’s say this huge Redwood Tree is like a big Postgres table... Shard Pruning Logic Co-Located Tables More CPU, Memory, Disk

Slide 77

Slide 77 text

77 What is sharding? Partitioning + sharding together How partitioning & sharding are different Why partitioning helps query performance When partitioning helps query performance Why sharding helps query performance When sharding helps query performance Part 8 What is Postgres partitioning?

Slide 78

Slide 78 text

Common example: “Cache hit ratio” has been going down When sharding can help improve query performance When your application needs more cpu, memory, or disk than is possible on a single Postgres node CPU Memory Disk

Slide 79

Slide 79 text

When sharding can help improve query performance CPU Memory Disk CPU Memory Disk CPU Memory Disk CPU Memory Disk CPU Memory Disk CPU Memory Disk CPU Memory Disk + + + + + CPU Memory Disk CPU Memory Disk CPU Memory Disk + + When your application benefits from more cpu, memory, or disk...

Slide 80

Slide 80 text

So what are the takeaways?

Slide 81

Slide 81 text

81 Takeaways about PG Partitioning & Sharding 1. Partitioning & Sharding can be “Invaluable” and “Lifesavers” 3. Planning required 2. You don’t have to pick between Partitioning & Sharding 4. Keys Matter

Slide 82

Slide 82 text

82 Takeaways about PG Partitioning & Sharding 1. Partitioning & Sharding can be “Invaluable” and “Lifesavers” 3. Planning required 2. You don’t have to pick between Partitioning & Sharding 4. Keys Matter

Slide 83

Slide 83 text

83 Takeaways about PG Partitioning & Sharding 1. Partitioning & Sharding can be “Invaluable” and “Lifesavers” 3. Planning required 2. You don’t have to pick between Partitioning & Sharding 4. Keys Matter

Slide 84

Slide 84 text

84 Takeaways about PG Partitioning & Sharding 1. Partitioning & Sharding can be “Invaluable” and “Lifesavers” 3. Planning required 2. You don’t have to pick between Partitioning & Sharding 4. Keys Matter

Slide 85

Slide 85 text

People to thank for inspiration &or reviews Daniel Gustafsson David Rowley Isaac Alves Jelte Fennema-Nio Marco Slot Melanie Plageman Robert Treat Ryan Booz Thomas Munro Umur Cubukcu Charles Feddersen

Slide 86

Slide 86 text

86 Favors to ask you!

Slide 87

Slide 87 text

Have you heard of this podcast for developers who love Postgres?  Monthly  episodes  A-ma-zing guests  Recorded LIVE TalkingPostgres.com Hosted by Claire Giordano 19+

Slide 88

Slide 88 text

TalkingPostgres.com Podcast about human side of Postgres

Slide 89

Slide 89 text

with topics like....  Becoming expert in PostgreSQL  From dev to Postgres specialist  Journey into Performance benchmarking  My Journey into Postgres monitoring  How I got started as a developer  Why people care about PostGIS TalkingPostgres.com Hosted by Claire Giordano

Slide 90

Slide 90 text

NEXT EPISODE is with Tom Lane IN OCTOBER to never miss an episode subscribe

Slide 91

Slide 91 text

Save the date aka.ms/posette-subscribe Subscribe to news 

Slide 92

Slide 92 text

92

Slide 93

Slide 93 text

@clairegiordano @[email protected] TalkingPostgres.com z aka.ms/posette z linkedin.com/in/claireg