Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

Citus Data
October 26, 2022

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

In this demo-rich session, you’ll learn how to build a multi-tenant SaaS application on Postgres with Citus. The demo will start small on a single Postgres server—and then you’ll see how you can scale your SaaS application globally by distributing Postgres. By using Citus, you can distribute tables across a cluster of Postgres servers in the Azure cloud and scale out as needed, rebalancing shards along the way with zero downtime. The Citus distributed query engine will route and parallelize your queries transparently to distributed tables across the Citus database cluster—giving your SaaS app the ability to grow along with the snappy performance your customers expect.

Citus Data

October 26, 2022
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Provides simple scale-out of operational workloads to execute on a

    cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication Traditional PostgreSQL Single Node Scale Distributed PostgreSQL Cluster Scale SQL Query SQL Query
  2. Provides simple scale-out of operational workloads to execute on a

    cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication SQL Query
  3. Easily the most common misconception about distributed databases is that

    they are only applicable to large data volumes The definition of “large” varies depending who you talk to However, Distributed databases are incredibly powerful even for small data volumes
  4. There are several common database bottlenecks that single node systems

    encounter Infrastructure Database Operations CPU – Increase total amount of read/write compute Cache – Increase cache hit ratio IO – Shared nothing architecture adds IO with new nodes Queries – Reduce scans by isolating onto a shard Data Modification – parallelize heavy operations Maintenance – parallelize tasks such as backup
  5. Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2]

    Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … CREATE TABLE SalesTransactions SELECT create_distributed_table
  6. CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23)

    INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) …
  7. CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23)

    INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … Node 1 Node 2 Node 3
  8. Relational database performance is highly sensitive to cache hit ratio

    Performance can “fall off a cliff” once cache hit ration drops Scale out can provide better than linear performance improvement where cache is under pressure • https://aka.ms/pg-hyperscale-perf 15,923 214,043 368 39 0 50 100 150 200 250 300 350 400 - 50,000 100,000 150,000 200,000 250,000 Single Node PostgreSQL 4 Node Citus (Cosmos DB) Millisecond Response Time Orders per minute Single Node vs. Cluster Workload Scalability
  9. If built well, the scalability of distributed databases is magical

    • Virtually linear scalability • Well balanced across compute nodes • Maximize local execution on the compute nodes (more on the next slide …and if done wrong, they will slow crawl and waste an resource
  10. 2. Local/Global aggregates for HTAP SQL Query 1 2 3

    4 5 Distributed COUNT(*) SUM (shard counts) # #
  11. Multi-tenant SaaS IoT tenand_id tenand_id tenand_id device_id Complex models with

    common tenant_id shard key Simple model (often no relationasips) sharded on device_id
  12. SQL

  13. Citus Cluster Online rebalancing of shards to less utilized nodes

    Shard move or shard split policies supported
  14. Azure Cosmos DB for PostgreSQL Cluster Enables online movement of

    tenants to new nodes Large or busy tenants can be isolated to a dedicated node to maximize performance No application code changes, or downtime required
  15. Try Azure Cosmos DB for PostgreSQL free for 30 days

    No Azure Subscription Required No Credit Card Required
  16. Open-source PostgreSQL and Citus extension for distributed queries are free

    downloads Empowers developers to develop and test locally, then deploy to the cloud Zero code changes for deployment
  17. Optional to enable Synchronous replication provides zero data loss on

    failover No application changes required Primary Cluster Standby Cluster High Availability
  18. Easy Measure Creation Pre-defined and configurable dashboards to monitor workload

    performance Configure alerts based on business specific thresholds Built-in Monitoring
  19. Users Private Link Provides private connectivity from a virtual network

    to Azure Simplifies the network architecture Secures connections between Azure endpoints by eliminating data exposure to the public internet Private Link