Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Citus from the Customer eyes | Citus Con: An Event for Postgres 2023 | Sai Srirampur

Citus from the Customer eyes | Citus Con: An Event for Postgres 2023 | Sai Srirampur

As a Solutions Engineer for the Citus database extension for the past ~7.5 years, I have closely worked with many customers and onboarded them to run their applications on Citus and PostgreSQL. This talk will synthesize all those exciting customer experiences to present an end-to-end journey of what a typical customer onboarding looks like for both Citus open source and for Citus on Azure (now known as Azure Cosmos DB for PostgreSQL.)

This talk will cover steps starting from "product discovery"—where you get to know the value of Citus, to "proof-of-concept"—where you validate that value, to "migration"—where you implement the workload and go into production with Citus.

The following topics would be covered as a part of the talk:

Why Citus to distribute PostgreSQL?
Fit use cases and the misfit ones
Data-modeling (includes distributed data-modeling)
Performance tuning
Migration process
Value of the managed service offering and so on
The talk would be a perfect fit for any customer who is planning to evaluate or is already evaluating Citus for their real-world use case.

Citus Data

April 12, 2023
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Citus from the Customer Eyes
    Sai Srirampur
    Engineering Manager,
    Customer Success for Postgres on Azure

    View Slide

  2. This talk is for
    1. SaaS companies
    looking to scale with
    distributed
    PostgreSQL.
    100s
    10s
    1000s

    View Slide

  3. This talk is for
    2. Companies who
    want to build scalable
    Timeseries apps

    View Slide

  4. Customer Journey - 4 steps
    1. What is Citus?
    2. When to use Citus?
    3. Data-modeling with Citus
    4. Running app in production with Citus

    View Slide

  5. What is Citus?

    View Slide

  6. Postgres At Any
    Scale

    View Slide

  7. Superpower of ”distributed tables” to scale-out Postgres
    Start Building apps on Single node Seamlessly scale to multiple nodes by distributing tables

    View Slide

  8. Packaged as
    extension,
    not a fork
    Postgres At Any
    Scale
    Open source
    &
    Fully managed

    View Slide

  9. Packaged as
    extension,
    not a fork
    Postgres At Any
    Scale
    Open source
    &
    Fully managed
    Open source
    & fully-managed

    View Slide

  10. Global
    replication
    Security
    Distributed
    Scale
    High
    Availability
    Compliance
    Enterprise-grade
    monitoring
    Easy
    ecosystem
    integration
    Extension
    support
    Extensions
    JSONB
    Full text
    search
    Geospatial
    support
    Rich
    indexing
    Powered by fully Open Source
    Citus Extension to Postgres
    Azure Cosmos DB
    for PostgreSQL
    Fully managed D-SQL
    offering built upon
    core benefits of PostgreSQL

    View Slide

  11. When to use Citus?
    Multi-tenant SaaS Apps
    Timeseries/IoT Apps

    View Slide

  12. Multi-tenant SaaS Apps – Scenarios
    • Expecting to outgrow single node soon.
    • Complex homegrown sharding to manage tenants
    • Incurring large tech debt
    • Inflexible to scale out
    • Inflexible to rebalance tenants

    View Slide

  13. Scaling multi-tenant SaaS apps with Citus
    • Automatic sharding of tenants across nodes
    • Online scale-out with shard rebalancer
    • Tenant Isolation
    • Comprehensive tenant monitoring
    • Row Level Security per tenant

    View Slide

  14. 11K+
    Customers
    800ms
    response times for
    95% of queries
    100
    requests per
    second
    20 nodes
    10B+
    Total #rows
    2 clusters
    in EU and US
    Reference

    View Slide

  15. Timeseries / IoT apps - Scenarios
    • Apps providing real-time insights on telemetry data
    • Examples: device telemetry, vehicle telemetry, logs, event data etc.
    • Have implicit scale requirement
    • Affinity to PostgreSQL

    View Slide

  16. Scaling timeseries apps with Citus
    • Horizontal scale out - Start small and scale out to multiple nodes
    • JSONB to store semi-structured data
    • Native timeseries APIs for partitioning, expiry and archival.

    View Slide

  17. Reference

    View Slide

  18. Data-modeling with Citus

    View Slide

  19. 5 tips for distributed data-modeling with Postgres

    View Slide

  20. TIP 1: Column that adds natural dimension to data is
    a good shard key
    • Represents a central piece of application
    • Lives at heart of application
    • Examples:
    • SaaS apps – customer_id is a good candidate
    • IoT app – the device_id is a good candidate
    More details on Distribution column and how Citus distributes data and queries

    View Slide

  21. github_events
    Event_id
    Event_type
    Event_hash
    Repo id
    User_id
    Repo id
    Payload
    Created_at
    org
    github_users
    User_id
    url
    login
    avatar
    Country_id
    gravatar
    Display login
    User_id is good candidate for distributing
    tables
    • Events generated by users themselves
    • Adds natural dimension & is at center of
    workload
    countries
    Country_id
    Name
    Continent
    Capital
    Example: App to analyze GitHub events

    View Slide

  22. TIP 2: Co-locate large tables & make small tables
    reference

    View Slide

  23. Example: App to analyze GitHub events
    github_events
    Event_id
    Event_type
    Event_hash
    Repo id
    User_id
    Repo id
    Payload
    Created_at
    org
    github_users
    User_id
    url
    login
    avatar
    Country_id
    gravatar
    Display login
    countries
    Country_id
    Name
    Continent
    Capital
    Distributed PostgreSQL tables with co-location:
    Events & users distributed on user_id
    SELECT create_distributed_table(
    'github_users', 'user_id');
    SELECT create_distributed_table(
    'github_events', 'user_id');
    Reference table:
    SELECT create_reference_table(
    'countries');

    View Slide

  24. Power of
    Co-locating tables
    Items-4
    Items-4
    Items-4
    events_4 countries
    Items-4
    Items-4
    Items-4
    users_4
    Items-4
    Items-4
    Items-4
    events_8
    Items-4
    Items-4
    Items-4
    users_8
    countries
    SQL pushdown -
    joins/fkeys
    Citus Worker 1 Citus Worker 2
    Citus Coordinator
    countries
    events
    users

    View Slide

  25. github_events
    Event_id
    Event_type
    Event_hash
    Repo id
    User_id
    Repo id
    Payload (JSONB)
    Created_at
    org
    • Simplified schema – single table to store a
    variety of data
    • GIN indexing for fast JSONB access
    • NoSQL capabilities to Postgres world
    • Helpful in implementing workloads that
    need scale – Ex IoT, Events, Logs
    TIP 3: Use JSONB data-type to store unstructured data

    View Slide

  26. TIP 4: Timeseries partitioning on distributed tables
    github_events
    Event_id
    Event_type
    User_id
    Repo id
    User_id
    Repo id
    Payload (JSONB)
    Created_at
    org
    Distribute on user_id
    Partition on created_at
    github_events_jan
    Event_id
    Event_type
    User_id

    github_events_feb
    Event_id
    Event_type
    User_id

    github_events_sept
    Event_id
    Event_type
    User_id

    github_events_aug
    Event_id
    Event_type
    User_id

    github_events_july
    Event_id
    Event_type
    User_id

    . . . . . . .
    Archiving older partitions using columnar Most recent partitions using row storage – real time ingests
    • Helps filtered querying with timestamp filters
    • Efficient expiry – DROP TABLE instead of DELETE
    • Helps archiving older data through columnar
    • Common in implementing timeseries workloads

    View Slide

  27. Timeseries APIs for partitioning, archival, & expiry
    • Automatic Data Partitioning – Easily Create partitions in past and future
    SELECT create_time_partitions(
    table_name:= 'time_series_events', partition_interval:= '1 day’,
    end_at:= '2021-10-30’,
    start_from:= '2021-10-10');
    • Automatic Data Archival – Convert older partitions from row to columnar
    CALL alter_old_partitions_set_access_method('time_series_events’,
    '2021-10-20', 'columnar');
    • Automatic Data Expiry - Expire data older than a given time period
    CALL drop_old_time_partitions(
    table_name:= 'time_series_events', older_than:= '2021-10-15');

    View Slide

  28. TIP 5: Scope queries to shard key where possible
    • Ex: SaaS workloads – queries can be scoped on tenant_id
    UPDATE ads SET impressions_count = impressions_count+1 WHERE
    id = 42;
    TO
    UPDATE ads SET impressions_count = impressions_count+1 WHERE
    id = 42 AND company_id = 1;
    • Avoids unnecessary fan out & helps query latency.
    • More relevant in transactional workloads

    View Slide

  29. 5 tips for distributed data-modeling with Postgres
    • TIP 1: Column that adds natural dimension to data is a good shard key
    • TIP 2: Co-locate large tables, make small tables reference
    • TIP 3: Use JSONB data-type to store unstructured data
    • TIP 4: Timeseries partitioning on distributed tables
    • TIP 5: Scope queries to shard key where possible

    View Slide

  30. Running app in production with Citus

    View Slide

  31. Line item Open Source / Self managed Azure Cosmos DB for PostgreSQL
    App Migration Use 5-tips for data-modeling and modify app
    End-to-end migration guide
    Data Migration
    pg_dump/pg_restore, pgcopydb for online migration
    pg_azure_storage extension: Query and load data directly from blob.
    ADF, Azure Databricks, Azure Stream Analytics, any integration to PG.
    Deployment Docker images for dev/test
    Burstable Compute for dev/test
    ARM/BICEP templates
    High Availability Patroni HA for Citus Auto HA with SLAs
    Upgrades Open Source Docs Self-serve upgrades
    Disaster Recovery pg_dump/pg_restore
    pgbackrest, wal-e, wal-g
    Automatic Cross-AZ Backups + PITR
    Cross-region read replicas
    Infra Metrics Self-managed using any PG monitoring tool Azure Monitor Integration
    PG Bouncer Setup using community docs Pgbouncer support included
    DB tuning and
    troubleshooting Performance tuning 1, Performance tuning 2, Useful Diagnostic Queries
    Security Roles, RLS Roles, RLS, Azure Firewall, Azure Private Endpoint
    Scale Self-manage addition, removal of infra
    Use fully OSS shard rebalancer
    Out-of-box support for vertical and horizontal scale
    Checklist to run app in production with Citus

    View Slide

  32. DEMO of
    Azure Cosmos DB for PostgreSQL
    (aka Citus on Azure)

    View Slide

  33. @saisrirampur
    @AzureCosmosDB
    @citusdata

    View Slide