Slide 1

Slide 1 text

Multi-tenant SaaS apps made simple on Azure Cosmos DB for PostgreSQL Adam Wołk, Senior Program Manager Alicja Kucharczyk, Senior Program Manager @ Citus Con: An Event for Postgres 2023

Slide 2

Slide 2 text

Software as a Service Squeezing margins out of your hardware

Slide 3

Slide 3 text

Multi-tenant SaaS • Same software • Same infrastructure • Multiple businesses/customers Customer value: Ease of operation Operator margin: Efficient use of hardware DNS

Slide 4

Slide 4 text

Sharding Models

Slide 5

Slide 5 text

Row based • 100k tenants per 16vCPU node • Adjust schema • Modify application queries Best hardware utilization

Slide 6

Slide 6 text

Row based vs Schema Based • 100k tenants per 16vCPU node • Adjust schema • Modify application queries Best hardware utilization • 4-5k tenants per 16vCPU node • No schema changes • No query changes Easy lift & shift

Slide 7

Slide 7 text

Row based vs Schema Based vs Database Sharding • 100k tenants per 16vCPU node • Adjust schema • Modify application queries Best hardware utilization • 4-5k tenants per 16vCPU node • No schema changes • No query changes Easy lift & shift • 100 tenants per 16vCPU node • No schema changes • No query changes Highest operational cost

Slide 8

Slide 8 text

What is Postgres?

Slide 9

Slide 9 text

Why We Postgres • Open Source • Constraints • Extensions • PostGIS • Citus • B-tree, GIN, BRIN, & GiST • Available as a database service • Decades of robustness • Millions of happy users • Foreign data wrappers • Window functions • CTEs • ACID • Full text search • JSONB • Rich datatypes • pg_stat_statements • Rollups

Slide 10

Slide 10 text

Azure Why PostgreSQL? Postgres-as-a-Service on Azure Postgres at any scale: Azure Cosmos DB for PostgreSQL

Slide 11

Slide 11 text

Azure Cosmos DB for PostgreSQL Build Multi-tenant Scale-out ready

Slide 12

Slide 12 text

Terminology Coordinator – Stores Metadata. Node which application connects to. Worker / Data nodes – Nodes which store data in form of shards. Sharding – Process of dividing data among nodes. Shards – A partition of the data containing a subset of rows.

Slide 13

Slide 13 text

Data modelling How to efficiently shard your schema?

Slide 14

Slide 14 text

Pick a distribution key • Good (high) cardinality • Filters reads to a single node • Fans out writes to multiple nodes Bad vs Good country vs unique_carrier dest or origin airport vs unique_carrier timestamp vs device_id

Slide 15

Slide 15 text

Identify types of tables • Distributed table • Reference table • Local table

Slide 16

Slide 16 text

Distributed Tables Definition: • Tables that are sharded. Classification: • Large tables (>10GB) – shard on same key (may require addition of shard key) • All tables are be co-located • Enables localized and fast joins on workers • Ex: transactions, events etc SELECT create_distributed_table(table_name, column_name);

Slide 17

Slide 17 text

Reference Tables Definition: • Replicated to all the nodes (extra latency) Classification: • Small tables < 10GB • Efficient joins with distributed tables • Cannot have sharding dimension • Ex: countries, categories SELECT create_reference_table(table_name);

Slide 18

Slide 18 text

Local tables • Plain Postgres tables on the coordinator node. • Admin Tables that don’t interact with main tables • Separate micro-service that doesn’t need sharding

Slide 19

Slide 19 text

Modify your application • Add distribution column to PK/FK • Add distribution key to queries

Slide 20

Slide 20 text

Monitoring What is the pulse of your system?

Slide 21

Slide 21 text

Monitor – upcoming in Citus 11.3 • Query counts per tenant • CPU utilization per tenant 10 80 80 50 50 70 30 1k 2K 100k 1k 99k 0.5k 0.5k

Slide 22

Slide 22 text

Noisy neighbors

Slide 23

Slide 23 text

Isolate tenant SELECT isolate_tenant_to_new_shard('table_name', tenant_id); SELECT citus_move_shard_placement(shard_id, source_node_name, source_node_port, target_node_name, target_node_port);

Slide 24

Slide 24 text

Scaling out

Slide 25

Slide 25 text

Rebalance { "tasks": [ { "LSN": { "lag": null, "source": "0/731ACC0", "target": null }, "size": { "source": "14 MB", "target": "5440 kB" }, "hosts": { "source": "localhost:9702", "target": "localhost:9700" }, "phase": "Copying Data", "state": "running", "command": "SELECT pg_catalog.citus_move_shard_placement(102020,2,3,'block_writes')", "message": "", "retried": 0, "task_id": 10 } ], "task_state_counts": { "done": 9, "blocked": 1, "running": 1 } } select * from citus_rebalance_start(); select jsonb_pretty(details) from citus_rebalance_status();

Slide 26

Slide 26 text

Thank You