$30 off During Our Annual Pro Sale. View Details »

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

Citus Data
October 26, 2022

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

In this demo-rich session, you’ll learn how to build a multi-tenant SaaS application on Postgres with Citus. The demo will start small on a single Postgres server—and then you’ll see how you can scale your SaaS application globally by distributing Postgres. By using Citus, you can distribute tables across a cluster of Postgres servers in the Azure cloud and scale out as needed, rebalancing shards along the way with zero downtime. The Citus distributed query engine will route and parallelize your queries transparently to distributed tables across the Citus database cluster—giving your SaaS app the ability to grow along with the snappy performance your customers expect.

Citus Data

October 26, 2022
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. None
  2. None
  3. Quick intro (or refresher) on Citus

  4. Provides simple scale-out of operational workloads to execute on a

    cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication Traditional PostgreSQL Single Node Scale Distributed PostgreSQL Cluster Scale SQL Query SQL Query
  5. Provides simple scale-out of operational workloads to execute on a

    cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication SQL Query
  6. 1990s 2000s 2010s 2020s

  7. Dispelling the great distributed database myth

  8. Easily the most common misconception about distributed databases is that

    they are only applicable to large data volumes The definition of “large” varies depending who you talk to However, Distributed databases are incredibly powerful even for small data volumes
  9. There are several common database bottlenecks that single node systems

    encounter Infrastructure Database Operations CPU – Increase total amount of read/write compute Cache – Increase cache hit ratio IO – Shared nothing architecture adds IO with new nodes Queries – Reduce scans by isolating onto a shard Data Modification – parallelize heavy operations Maintenance – parallelize tasks such as backup
  10. Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2]

    Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … CREATE TABLE SalesTransactions SELECT create_distributed_table
  11. CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23)

    INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) …
  12. CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23)

    INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … Node 1 Node 2 Node 3
  13. DEMO: Single node vs. Parallel Update

  14. Single Node Citus

  15. Relational database performance is highly sensitive to cache hit ratio

    Performance can “fall off a cliff” once cache hit ration drops Scale out can provide better than linear performance improvement where cache is under pressure • https://aka.ms/pg-hyperscale-perf 15,923 214,043 368 39 0 50 100 150 200 250 300 350 400 - 50,000 100,000 150,000 200,000 250,000 Single Node PostgreSQL 4 Node Citus (Cosmos DB) Millisecond Response Time Orders per minute Single Node vs. Cluster Workload Scalability
  16. Building distributed databases

  17. If built well, the scalability of distributed databases is magical

    • Virtually linear scalability • Well balanced across compute nodes • Maximize local execution on the compute nodes (more on the next slide …and if done wrong, they will slow crawl and waste an resource
  18. 1. Filter on shard column SQL Query 1 2 3

    4 5
  19. 2. Primary/foreign key checks within compute node SQL Query 1

    2 3 4 5
  20. 3. Unique constraints within compute node SQL Query 1 2

    3 4 5
  21. 4. Joins within compute node SQL Query 1 2 3

    4 5
  22. 2. Local/Global aggregates for HTAP SQL Query 1 2 3

    4 5 Distributed COUNT(*) SUM (shard counts) # #
  23. Multi-tenant SaaS IoT tenand_id tenand_id tenand_id device_id Complex models with

    common tenant_id shard key Simple model (often no relationasips) sharded on device_id
  24. DEMO: Optimizing Distributed Data

  25. SQL

  26. Some options to fix it when things slow down

  27. Citus Cluster Online rebalancing of shards to less utilized nodes

    Shard move or shard split policies supported
  28. Azure Cosmos DB for PostgreSQL Cluster Enables online movement of

    tenants to new nodes Large or busy tenants can be isolated to a dedicated node to maximize performance No application code changes, or downtime required
  29. None
  30. Getting started

  31. Download Citus today No Azure Subscription Required No Credit Card

    Required
  32. Try Azure Cosmos DB for PostgreSQL free for 30 days

    No Azure Subscription Required No Credit Card Required
  33. Managed Service

  34. Open-source PostgreSQL and Citus extension for distributed queries are free

    downloads Empowers developers to develop and test locally, then deploy to the cloud Zero code changes for deployment
  35. Optional to enable Synchronous replication provides zero data loss on

    failover No application changes required Primary Cluster Standby Cluster High Availability
  36. Easy Measure Creation Pre-defined and configurable dashboards to monitor workload

    performance Configure alerts based on business specific thresholds Built-in Monitoring
  37. Users Private Link Provides private connectivity from a virtual network

    to Azure Simplifies the network architecture Secures connections between Azure endpoints by eliminating data exposure to the public internet Private Link
  38. aka.ms/trycosmosdb General Availability – October 2022

  39. © Copyright Microsoft Corporation. All rights reserved.