Slide 1

Slide 1 text

Multi-tenant SaaS apps made simple on Azure Cosmos DB for PostgreSQL Adam Wołk, Senior Program Manager Alicja Kucharczyk, Senior Program Manager @ Citus Con: An Event for Postgres 2023

Slide 2

Slide 2 text

Software as a Service Squeezing margins out of your hardware

Slide 3

Slide 3 text

Multi-tenant SaaS • Same software • Same infrastructure • Multiple businesses/customers Customer value: Ease of operation Operator margin: Efficient use of hardware DNS

Slide 4

Slide 4 text

Sharding Models

Slide 5

Slide 5 text

Row based vs Schema Based vs Database Sharding • 100k tenants per 16vCPU node • Adjust schema • Modify application queries • 4-5k tenants per 16vCPU node • No schema changes • No query changes • 100 tenants per 16vCPU node • No schema changes • No query changes Best hardware utilization Easy lift & shift Highest operational cost

Slide 6

Slide 6 text

Azure Cosmos DB for PostgreSQL Build Multi-tenant Scale-out ready

Slide 7

Slide 7 text

Terminology Coordinator – Stores Metadata. Node which application connects to. Worker / Data nodes – Nodes which store data in form of shards. Sharding – Process of dividing data among nodes. Shards – A partition of the data containing a subset of rows.

Slide 8

Slide 8 text

Data modelling How to efficiently shard your schema?

Slide 9

Slide 9 text

Pick a distribution key • Good (high) cardinality • Filters reads to a single node • Fans out writes to multiple nodes Bad vs Good country vs unique_carrier dest or origin airport vs unique_carrier timestamp vs device_id

Slide 10

Slide 10 text

Identify types of tables • Distributed table • Reference table • Local table

Slide 11

Slide 11 text

Distributed Tables Definition: • Tables that are sharded. Classification: • Large tables (>10GB) – shard on same key (may require addition of shard key) • All tables are be co-located • Enables localized and fast joins on workers • Ex: transactions, events etc SELECT create_distributed_table(table_name, column_name);

Slide 12

Slide 12 text

Reference Tables Definition: • Replicated to all the nodes (extra latency) Classification: • Small tables < 10GB • Efficient joins with distributed tables • Cannot have sharding dimension • Ex: countries, categories SELECT create_reference_table(table_name);

Slide 13

Slide 13 text

Local tables • Plain Postgres tables on the coordinator node. • Admin Tables that don’t interact with main tables • Separate micro-service that doesn’t need sharding

Slide 14

Slide 14 text

Modify your application • Add distribution column to PK/FK • Add distribution key to queries

Slide 15

Slide 15 text

Monitoring What is the pulse of your system?

Slide 16

Slide 16 text

Monitor – upcoming in Citus 11.3 • Query counts per tenant • CPU utilization per tenant 10 80 80 50 50 70 30 1k 2K 100k 1k 99k 0.5k 0.5k

Slide 17

Slide 17 text

Noisy neighbors

Slide 18

Slide 18 text

Isolate tenant - commands SELECT isolate_tenant_to_new_shard('table_name', tenant_id); SELECT citus_move_shard_placement(shard_id, source_node_name, source_node_port, target_node_name, target_node_port);

Slide 19

Slide 19 text

Scaling out

Slide 20

Slide 20 text

Rebalance { "tasks": [ { "LSN": { "lag": null, "source": "0/731ACC0", "target": null }, "size": { "source": "14 MB", "target": "5440 kB" }, "hosts": { "source": "localhost:9702", "target": "localhost:9700" }, "phase": "Copying Data", "state": "running", "command": "SELECT pg_catalog.citus_move_shard_placement(102020,2,3,'block_writes')", "message": "", "retried": 0, "task_id": 10 } ], "task_state_counts": { "done": 9, "blocked": 1, "running": 1 } } select * from citus_rebalance_start(); select jsonb_pretty(details) from citus_rebalance_status();

Slide 21

Slide 21 text

Thank You Alicja Kucharczyk - @stiepantrofimo Adam Wołk - @mulander