Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

Citus Data
October 26, 2022

Building Distributed PostgreSQL Apps with Citus 11 | PGConf EU 2022 | Charles Feddersen

In this demo-rich session, you’ll learn how to build a multi-tenant SaaS application on Postgres with Citus. The demo will start small on a single Postgres server—and then you’ll see how you can scale your SaaS application globally by distributing Postgres. By using Citus, you can distribute tables across a cluster of Postgres servers in the Azure cloud and scale out as needed, rebalancing shards along the way with zero downtime. The Citus distributed query engine will route and parallelize your queries transparently to distributed tables across the Citus database cluster—giving your SaaS app the ability to grow along with the snappy performance your customers expect.

Citus Data

October 26, 2022
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. View Slide

  2. View Slide

  3. Quick intro (or refresher) on Citus

    View Slide

  4. Provides simple scale-out of
    operational workloads to
    execute on a cluster on
    machines
    Single connection – no code
    changes to the app
    Scale locally across a cluster,
    or globally using replication
    Traditional PostgreSQL
    Single Node Scale
    Distributed PostgreSQL
    Cluster Scale
    SQL Query
    SQL Query

    View Slide

  5. Provides simple scale-out of
    operational workloads to
    execute on a cluster on
    machines
    Single connection – no code
    changes to the app
    Scale locally across a cluster,
    or globally using replication
    SQL Query

    View Slide

  6. 1990s 2000s 2010s 2020s

    View Slide

  7. Dispelling the great distributed database myth

    View Slide

  8. Easily the most common misconception about distributed databases is that
    they are only applicable to large data volumes
    The definition of “large” varies depending who you talk to
    However,
    Distributed databases are incredibly powerful even for small data volumes

    View Slide

  9. There are several common database bottlenecks that single node systems encounter
    Infrastructure Database Operations
    CPU – Increase total amount of
    read/write compute
    Cache – Increase cache hit ratio
    IO – Shared nothing architecture adds IO
    with new nodes
    Queries – Reduce scans by isolating onto
    a shard
    Data Modification – parallelize heavy
    operations
    Maintenance – parallelize tasks such as
    backup

    View Slide

  10. Sales
    Transactions
    Sales
    Transactions
    [Shard 1]
    Sales
    Transactions
    [Shard 2]
    Sales
    Transactions
    [Shard 5]
    Sales
    Transactions
    [Shard 3]
    Sales
    Transactions
    [Shard 4]
    Sales
    Transactions
    [Shard 32]
    (default)

    CREATE TABLE
    SalesTransactions
    SELECT
    create_distributed_table

    View Slide

  11. CREATE TABLE
    SalesTransactions
    SELECT
    create_distributed_table
    INSERT INTO SalesTransactions VALUES (1,1,10.23)
    INSERT INTO SalesTransactions VALUES (2,3,17.94)
    Sales
    Transactions
    Sales
    Transactions
    [Shard 1]
    Sales
    Transactions
    [Shard 2]
    Sales
    Transactions
    [Shard 5]
    Sales
    Transactions
    [Shard 3]
    Sales
    Transactions
    [Shard 4]
    Sales
    Transactions
    [Shard 32]
    (default)

    View Slide

  12. CREATE TABLE
    SalesTransactions
    SELECT
    create_distributed_table
    INSERT INTO SalesTransactions VALUES (1,1,10.23)
    INSERT INTO SalesTransactions VALUES (2,3,17.94)
    Sales
    Transactions
    Sales
    Transactions
    [Shard 1]
    Sales
    Transactions
    [Shard 2]
    Sales
    Transactions
    [Shard 5]
    Sales
    Transactions
    [Shard 3]
    Sales
    Transactions
    [Shard 4]
    Sales
    Transactions
    [Shard 32]
    (default)

    Node 1 Node 2 Node 3

    View Slide

  13. DEMO: Single node vs. Parallel Update

    View Slide

  14. Single Node
    Citus

    View Slide

  15. Relational database
    performance is highly
    sensitive to cache hit ratio
    Performance can “fall off a
    cliff” once cache hit ration
    drops
    Scale out can provide better
    than linear performance
    improvement where cache is
    under pressure
    • https://aka.ms/pg-hyperscale-perf
    15,923
    214,043
    368
    39
    0
    50
    100
    150
    200
    250
    300
    350
    400
    -
    50,000
    100,000
    150,000
    200,000
    250,000
    Single Node PostgreSQL 4 Node Citus (Cosmos DB)
    Millisecond Response Time
    Orders per minute
    Single Node vs. Cluster Workload Scalability

    View Slide

  16. Building distributed databases

    View Slide

  17. If built well, the scalability of distributed databases is magical
    • Virtually linear scalability
    • Well balanced across compute nodes
    • Maximize local execution on the compute nodes (more on the next slide
    …and if done wrong, they will slow crawl and waste an resource

    View Slide

  18. 1. Filter on shard column
    SQL Query
    1 2 3 4 5

    View Slide

  19. 2. Primary/foreign key checks within compute node
    SQL Query
    1 2 3 4 5

    View Slide

  20. 3. Unique constraints within compute node
    SQL Query
    1 2 3 4 5

    View Slide

  21. 4. Joins within compute node
    SQL Query
    1 2 3 4 5

    View Slide

  22. 2. Local/Global aggregates for HTAP
    SQL Query
    1 2 3 4 5
    Distributed COUNT(*)
    SUM (shard counts)
    # #

    View Slide

  23. Multi-tenant SaaS IoT
    tenand_id
    tenand_id
    tenand_id
    device_id
    Complex models with common tenant_id shard key Simple model (often no relationasips) sharded on
    device_id

    View Slide

  24. DEMO: Optimizing Distributed Data

    View Slide

  25. SQL

    View Slide

  26. Some options to fix it when things slow down

    View Slide

  27. Citus Cluster
    Online rebalancing of
    shards to less utilized
    nodes
    Shard move or shard split
    policies supported

    View Slide

  28. Azure Cosmos DB for PostgreSQL Cluster
    Enables online movement
    of tenants to new nodes
    Large or busy tenants can
    be isolated to a dedicated
    node to maximize
    performance
    No application code
    changes, or downtime
    required

    View Slide

  29. View Slide

  30. Getting started

    View Slide

  31. Download Citus today
    No Azure Subscription Required
    No Credit Card Required

    View Slide

  32. Try Azure Cosmos DB for PostgreSQL free for 30 days
    No Azure Subscription Required
    No Credit Card Required

    View Slide

  33. Managed Service

    View Slide

  34. Open-source PostgreSQL and Citus
    extension for distributed queries are
    free downloads
    Empowers developers to develop and
    test locally, then deploy to the cloud
    Zero code changes for deployment

    View Slide

  35. Optional to enable
    Synchronous replication
    provides zero data loss on
    failover
    No application changes
    required
    Primary Cluster
    Standby Cluster
    High Availability

    View Slide

  36. Easy Measure
    Creation
    Pre-defined and
    configurable dashboards
    to monitor workload
    performance
    Configure alerts based on
    business specific
    thresholds
    Built-in Monitoring

    View Slide

  37. Users
    Private Link
    Provides private connectivity
    from a virtual network to
    Azure
    Simplifies the network
    architecture
    Secures connections between
    Azure endpoints by
    eliminating data exposure to
    the public internet
    Private Link

    View Slide

  38. aka.ms/trycosmosdb
    General Availability – October 2022

    View Slide

  39. © Copyright Microsoft Corporation. All rights reserved.

    View Slide